Jan 23, 2025
HW 01 due Thursday, January 30 at 11:59pm
Released after class.
Make sure you are a member of the course GitHub organization
The simple linear regression model can be represented using vectors and matrices as
\[ \large{\mathbf{y} = \mathbf{X}\boldsymbol{\beta} + \boldsymbol{\epsilon}} \]
\(\mathbf{y}\) : Vector of responses
\(\mathbf{X}\): Design matrix (columns for predictors + intercept)
\(\boldsymbol{\beta}\): Vector of model coefficients
\(\boldsymbol{\epsilon}\): Vector of error terms centered at \(\mathbf{0}\) with variance \(\sigma^2_{\epsilon}\mathbf{I}\)
We used matrix calculus to derive the estimator \(\hat{\boldsymbol{\beta}}\) that minimizes \(\boldsymbol{\epsilon}^\mathsf{T}\boldsymbol{\epsilon}\)
\[\hat{\boldsymbol{\beta}} = (\mathbf{X}^\mathsf{T}\mathbf{X})^{-1}\mathbf{X}^\mathsf{T}\mathbf{y}\]
Now letโs consider how to derive the least-squares estimator using a geometric interpretation of regression
Let \(\text{Col}(\mathbf{X})\) be the column space of \(\mathbf{X}\): the set all possible linear combinations (span) of the columns of \(\mathbf{X}\)
The vector of responses \(\mathbf{y}\) is not in \(\text{Col}(\mathbf{X})\).
Goal: Find another vector \(\mathbf{z} = \mathbf{Xb}\) that is in \(\text{Col}(\mathbf{X})\) and is as close as possible to \(\mathbf{y}\).
For any \(\mathbf{z} = \mathbf{Xb}\) in \(\text{Col}(\mathbf{X})\), the vector \(\mathbf{e} = \mathbf{y} - \mathbf{Xb}\) is the difference between \(\mathbf{y}\) and \(\mathbf{Xb}\).
This distance is minimized when \(\mathbf{e}\) is orthogonal to \(\text{Col}(\mathbf{X})\)
Note: If \(\mathbf{A}\), an \(n \times k\) matrix, is orthogonal to an \(n \times 1\) vector \(\mathbf{c}\), then \(\mathbf{A}^\mathsf{T}\mathbf{c} = \mathbf{0}\)
Therefore, we have \(\mathbf{X}^\mathsf{T}\mathbf{e} = \mathbf{0}\) , and thus
\[ \mathbf{X}^\mathsf{T}(\mathbf{y} - \mathbf{Xb}) = \mathbf{0} \]
Solve for \(\mathbf{b}\) .
Recall the hat matrix \(\mathbf{H} = \mathbf{X}(\mathbf{X}^\mathsf{T}\mathbf{X})^{-1}\mathbf{X}^\mathsf{T}\).
\(\hat{\mathbf{y}} = \mathbf{Hy}\), so \(\mathbf{H}\) is a projection of \(\mathbf{y}\) onto \(\mathbf{Xb}\)
Properties of \(\mathbf{H}\), a projection matrix
\(\mathbf{H}\) is symmetric (\(\mathbf{H}^\mathsf{T} = \mathbf{H}\))
\(\mathbf{H}\) is idempotent (\(\mathbf{H}^2 = \mathbf{H}\))
If \(\mathbf{v}\) in \(\text{Col}(\mathbf{X})\), then \(\mathbf{Hv} = \mathbf{v}\)
If \(\mathbf{v}\) is orthogonal to \(\text{Col}(\mathbf{X})\), then \(\mathbf{Hv} = \mathbf{0}\)