Geometric interpretation of least-squares regression

Author

Prof. Maria Tackett

Published

Jan 23, 2025

Announcements

  • HW 01 due Thursday, January 30 at 11:59pm

    • Released after class.

    • Make sure you are a member of the course GitHub organization

      • If you can see the number of people in the org, then you are a member!

Topics

  • Geometric interpretation of least-squares regression

Recap: Regression in matrix from

The simple linear regression model can be represented using vectors and matrices as

y=Xβ+ϵ

  • y : Vector of responses

  • X: Design matrix (columns for predictors + intercept)

  • β: Vector of model coefficients

  • ϵ: Vector of error terms centered at 0 with variance σϵ2I

Recap: Derive β^

We used matrix calculus to derive the estimator β^ that minimizes ϵTϵ

β^=(XTX)1XTy

. . .

Now let’s consider how to derive the least-squares estimator using a geometric interpretation of regression

Geometry of least-squares regression

  • Let Col(X) be the column space of X: the set all possible linear combinations (span) of the columns of X

  • The vector of responses y is not in Col(X).

  • Goal: Find another vector z=Xb that is in Col(X) and is as close as possible to y.

    • z is a projection of y onto Col(X) .

Geometry of least-squares regression

  • For any z=Xb in Col(X), the vector e=yXb is the difference between y and Xb.

    • We want to find b such that z=Xb is as close as possible to y, i.e, we want to minimize the difference e=yXb
  • This distance is minimized when e is orthogonal to Col(X)

Geometry of least-squares regression

  • Note: If A, an n×k matrix, is orthogonal to an n×1 vector c, then ATc=0

  • Therefore, we have XTe=0 , and thus

    XT(yXb)=0

Solve for b .

Hat matrix

  • Recall the hat matrix H=X(XTX)1XT.

  • y^=Hy, so H is a projection of y onto Xb

  • Properties of H, a projection matrix

    • H is symmetric (HT=H)

    • H is idempotent (H2=H)

    • If v in Col(X), then Hv=v

    • If v is orthogonal to Col(X), then Hv=0