Fancier optimization

Regularization

Transfer Learning

Problems with Stochastic gradient desccent

SGD: Compute the gradient and step in the direction of oppsite (negative) gradient

Untitled

  1. Case: Horizontal Loss changes slowly whereas Vertical Loss is sensitive to changes / which means:
    1. Loss has a bad condition number at that point
    2. Big ratio between the largest and the smallest sigular values of Hessian Matrix at that point

Result:

Untitled

  1. SGD getting stuck at where the gradient is 0 (Locally flat)
    1. Local Minima: (+) - (0) - (+)
    2. Saddle Point: (+) - (0) - (-)
      1. the middle point (0) of gradient going from positive to negative

Untitled