Chapter : Lesson 3

Episode 6 - Gradient Descent in More Detail

face Josiah Wang

Summary:

  • Gradient descent algorithm (for two parameters):

    1. Choose random w_1 and w_0.
    2. Repeat until convergence:
      • w_1 \leftarrow w_1 - \alpha \frac{\partial L}{\partial w_1}
      • w_2 \leftarrow w_2 - \alpha \frac{\partial L}{\partial w_2}
  • Gradient/derivative: by how much does a value change when the value of a variable changes by a minuscule amount (tending towards zero).

  • For linear regression (assuming sum of squares error):

    • Loss function, L(\theta) = \frac{1}{2} \sum_i^N (w_1 x^{[i]} + w_0 - y^{[i]})^2 = \frac{1}{2} \sum_i^N (\hat{y}^{[i]} - y^{[i]})^2.
    • \frac{\partial L}{\partial w_1} = \sum_i^N (w_1 x^{[i]} + w_0 - y^{[i]}) x^{[i]} = \sum_i^N (\hat{y}^{[i]} - y^{[i]}) x^{[i]}.
    • \frac{\partial L}{\partial w_0} = \sum_i^N (w_1 x^{[i]} + w_0 - y^{[i]}) = \sum_i^N (\hat{y}^{[i]} - y^{[i]}).