Hypothesis
$$ \begin{align*} h_\theta(x) &= \theta^{T} x \\ x_0 &= 1 \end{align*} $$
Cost Function
$$ \begin{align*} J(\theta) &= \dfrac {1}{2m} \sum _{i=1}^m \left (h_\theta (x^{(i)}) - y^{(i)} \right)^2 + \dfrac{\lambda}{2m} \sum _{j=1}^n \theta_j^2 \\ \end{align*} $$
Algorithms
  1. Gradient Descent $$ \begin{align*} \theta_0 := & \theta_0 - \alpha \frac{1}{m} \sum\limits_{i=1}^{m}(h_\theta(x^{(i)}) - y^{(i)}) x_0^{(i)} \\ \theta_j := & \theta_j(1-\alpha\frac{\lambda}{m}) - \alpha \frac{1}{m} \sum\limits_{i=1}^{m}(h_\theta(x^{(i)}) - y^{(i)}) x_j^{(i)} \\ &(j=1,...n) \end{align*} $$ where $$ \begin{align*} m = & \mbox{number of training samples} \\ n = & \mbox{number of features} \\ \alpha = & \mbox{learning rate} \\ \lambda = & \mbox{regularization parameter} \end{align*} $$
  2. Normal Equation $$ \theta = (X^TX + \lambda \left[\begin{array}{cccc} 0\\ & 1\\ & & \ddots\\ & & & 1\end{array}\right])^{-1}X^{T}y $$
Gradient descent with \(m = 6, n = 1, \lambda = 0\) desmos
$$ \begin{align*} h_\theta(x) &= \theta_0 + \theta_1 x \\ J(\theta_0, \theta_1) &= \dfrac {1}{2m} \sum _{i=1}^m \left (h_\theta (x^{(i)}) - y^{(i)} \right)^2 \end{align*} $$
\(\alpha\)
\(\theta_0\)
\(\theta_1\)