Lesson 3
Linear Regression
Chapter : Lesson 3
Episode 7 - Learning Rate
Summary:
- The learning rate \alpha is a hyperparameter that controls the step size.
- Small \alpha, move slowly. Too small and it will take a long time to converge.
- Big \alpha, take big steps. Too large and it will overshoot the minimum point and fail to converge.
- Some improvements over ‘vanilla’ gradient descent with fixed alpha:
- Decay \alpha over time.
- Adaptive learning rates: Reduce learning rate independently per parameter depending on the gradient over time.
- Examples: Adam and Adadelta