Chapter : Lesson 4

Episode 6 - Gradient Descent Revisited

face Josiah Wang

Summary:

  • Gradient for binary cross-entropy loss
    • Same form as linear regression
    • Just need to change \hat{y} to logistic regression
  • Variants for gradient descent
    • Plain (vanilla) gradient descent: Update weights after computing gradient for all training examples
    • Stochastic gradient descent (SGD): Update weights after computing gradient for one training example
    • Mini-batch gradient descent: Update weights after computing gradient for one batch
  • Terminologies:
    • One epoch: going through the whole training set once
    • You can divide your training set into several batches
    • Batch size: How large is each batch?
    • Iteration: How many batches do you have?
    • Epoch = Iteration \times Batch size