Fancier optimization
Regularization
Transfer Learning
Problems with Stochastic gradient desccent
SGD: Compute the gradient and step in the direction of oppsite (negative) gradient

- Case: Horizontal Loss changes slowly whereas Vertical Loss is sensitive to changes / which means:
- Loss has a bad condition number at that point
- Big ratio between the largest and the smallest sigular values of Hessian Matrix at that point
Result:
- The direction of the gradient does not align with the direction towards the minima.
- Thus, slow progress the horizontal dimension (less sensitive dimension), zig-zagging movement along the sensitive dimension
- It is more prone to happen in high dimensions

- SGD getting stuck at where the gradient is 0 (Locally flat)
- Local Minima: (+) - (0) - (+)
- Saddle Point: (+) - (0) - (-)
- the middle point (0) of gradient going from positive to negative
- In practice, saddle point is more likely to happen than local minima
- Not only at the saddle point but also near the saddle point the gradient is very small which means SGD makes very slow progress
