Lesson 4: Linear Classification > Episode 6 - Gradient Descent Revisited | Demystifying Machine Learning (COMP60028 Spring Term 2024/2025) | Department of Computing

Lesson 4

Linear Classification

face Josiah Wang

Summary:

Gradient for binary cross-entropy loss
- Same form as linear regression
- Just need to change $\hat{y}$ to logistic regression
Variants for gradient descent
- Plain (vanilla) gradient descent: Update weights after computing gradient for all training examples
- Stochastic gradient descent (SGD): Update weights after computing gradient for one training example
- Mini-batch gradient descent: Update weights after computing gradient for one batch
Terminologies:
- One epoch: going through the whole training set once
- You can divide your training set into several batches
- Batch size: How large is each batch?
- Iteration: How many batches do you have?
- Epoch = Iteration $\times$ Batch size