Chapter : Lesson 3

Episode 7 - Learning Rate

face Josiah Wang

Summary:

  • The learning rate \alpha is a hyperparameter that controls the step size.
    • Small \alpha, move slowly. Too small and it will take a long time to converge.
    • Big \alpha, take big steps. Too large and it will overshoot the minimum point and fail to converge.
  • Some improvements over ‘vanilla’ gradient descent with fixed alpha:
    • Decay \alpha over time.
    • Adaptive learning rates: Reduce learning rate independently per parameter depending on the gradient over time.
      • Examples: Adam and Adadelta