Lesson 3: Linear Regression > Episode 7 - Learning Rate | Demystifying Machine Learning (COMP60028 Spring Term 2024/2025) | Department of Computing

Lesson 3

Linear Regression

face Josiah Wang

Summary:

The learning rate $\alpha$ is a hyperparameter that controls the step size.
- Small $\alpha$ , move slowly. Too small and it will take a long time to converge.
- Big $\alpha$ , take big steps. Too large and it will overshoot the minimum point and fail to converge.
Some improvements over ‘vanilla’ gradient descent with fixed alpha:
- Decay $\alpha$ over time.
- Adaptive learning rates: Reduce learning rate independently per parameter depending on the gradient over time.
  - Examples: Adam and Adadelta