Geometric interpretation of least-squares regression

Prof. Maria Tackett

Jan 23, 2025

Announcements

  • HW 01 due Thursday, January 30 at 11:59pm

    • Released after class.

    • Make sure you are a member of the course GitHub organization

      • If you can see the number of people in the org, then you are a member!

Topics

  • Geometric interpretation of least-squares regression

Recap: Regression in matrix from

The simple linear regression model can be represented using vectors and matrices as

\[ \large{\mathbf{y} = \mathbf{X}\boldsymbol{\beta} + \boldsymbol{\epsilon}} \]

  • \(\mathbf{y}\) : Vector of responses

  • \(\mathbf{X}\): Design matrix (columns for predictors + intercept)

  • \(\boldsymbol{\beta}\): Vector of model coefficients

  • \(\boldsymbol{\epsilon}\): Vector of error terms centered at \(\mathbf{0}\) with variance \(\sigma^2_{\epsilon}\mathbf{I}\)

Recap: Derive \(\hat{\boldsymbol{\beta}}\)

We used matrix calculus to derive the estimator \(\hat{\boldsymbol{\beta}}\) that minimizes \(\boldsymbol{\epsilon}^\mathsf{T}\boldsymbol{\epsilon}\)

\[\hat{\boldsymbol{\beta}} = (\mathbf{X}^\mathsf{T}\mathbf{X})^{-1}\mathbf{X}^\mathsf{T}\mathbf{y}\]

Now letโ€™s consider how to derive the least-squares estimator using a geometric interpretation of regression

Geometry of least-squares regression

  • Let \(\text{Col}(\mathbf{X})\) be the column space of \(\mathbf{X}\): the set all possible linear combinations (span) of the columns of \(\mathbf{X}\)

  • The vector of responses \(\mathbf{y}\) is not in \(\text{Col}(\mathbf{X})\).

  • Goal: Find another vector \(\mathbf{z} = \mathbf{Xb}\) that is in \(\text{Col}(\mathbf{X})\) and is as close as possible to \(\mathbf{y}\).

    • \(\mathbf{z}\) is a projection of \(\mathbf{y}\) onto \(\text{Col}(\mathbf{X})\) .

Geometry of least-squares regression

  • For any \(\mathbf{z} = \mathbf{Xb}\) in \(\text{Col}(\mathbf{X})\), the vector \(\mathbf{e} = \mathbf{y} - \mathbf{Xb}\) is the difference between \(\mathbf{y}\) and \(\mathbf{Xb}\).

    • We want to find \(\mathbf{b}\) such that \(\mathbf{z} = \mathbf{Xb}\) is as close as possible to \(\mathbf{y}\), i.e, we want to minimize the difference \(\mathbf{e} = \mathbf{y} - \mathbf{Xb}\)
  • This distance is minimized when \(\mathbf{e}\) is orthogonal to \(\text{Col}(\mathbf{X})\)

Geometry of least-squares regression

  • Note: If \(\mathbf{A}\), an \(n \times k\) matrix, is orthogonal to an \(n \times 1\) vector \(\mathbf{c}\), then \(\mathbf{A}^\mathsf{T}\mathbf{c} = \mathbf{0}\)

  • Therefore, we have \(\mathbf{X}^\mathsf{T}\mathbf{e} = \mathbf{0}\) , and thus

    \[ \mathbf{X}^\mathsf{T}(\mathbf{y} - \mathbf{Xb}) = \mathbf{0} \]

Solve for \(\mathbf{b}\) .

Hat matrix

  • Recall the hat matrix \(\mathbf{H} = \mathbf{X}(\mathbf{X}^\mathsf{T}\mathbf{X})^{-1}\mathbf{X}^\mathsf{T}\).

  • \(\hat{\mathbf{y}} = \mathbf{Hy}\), so \(\mathbf{H}\) is a projection of \(\mathbf{y}\) onto \(\mathbf{Xb}\)

  • Properties of \(\mathbf{H}\), a projection matrix

    • \(\mathbf{H}\) is symmetric (\(\mathbf{H}^\mathsf{T} = \mathbf{H}\))

    • \(\mathbf{H}\) is idempotent (\(\mathbf{H}^2 = \mathbf{H}\))

    • If \(\mathbf{v}\) in \(\text{Col}(\mathbf{X})\), then \(\mathbf{Hv} = \mathbf{v}\)

    • If \(\mathbf{v}\) is orthogonal to \(\text{Col}(\mathbf{X})\), then \(\mathbf{Hv} = \mathbf{0}\)