Fancier optimization

Regularization

Transfer Learning

Problems with Stochastic gradient desccent

SGD: Compute the gradient and step in the direction of oppsite (negative) gradient

Untitled

Case: Horizontal Loss changes slowly whereas Vertical Loss is sensitive to changes / which means:
1. Loss has a bad condition number at that point
2. Big ratio between the largest and the smallest sigular values of Hessian Matrix at that point

Result:

The direction of the gradient does not align with the direction towards the minima.
Thus, slow progress the horizontal dimension (less sensitive dimension), zig-zagging movement along the sensitive dimension
It is more prone to happen in high dimensions

Untitled

SGD getting stuck at where the gradient is 0 (Locally flat)
1. Local Minima: (+) - (0) - (+)
2. Saddle Point: (+) - (0) - (-)
  1. the middle point (0) of gradient going from positive to negative

In practice, saddle point is more likely to happen than local minima
Not only at the saddle point but also near the saddle point the gradient is very small which means SGD makes very slow progress

Untitled