Lesson 4
Linear Classification
Chapter : Lesson 4
Episode 6 - Gradient Descent Revisited
Summary:
- Gradient for binary cross-entropy loss
- Same form as linear regression
- Just need to change \hat{y} to logistic regression
- Variants for gradient descent
- Plain (vanilla) gradient descent: Update weights after computing gradient for all training examples
- Stochastic gradient descent (SGD): Update weights after computing gradient for one training example
- Mini-batch gradient descent: Update weights after computing gradient for one batch
- Terminologies:
- One epoch: going through the whole training set once
- You can divide your training set into several batches
- Batch size: How large is each batch?
- Iteration: How many batches do you have?
- Epoch = Iteration \times Batch size