Geometric interpretation of least-squares regression

Author

Prof. Maria Tackett

Published

Jan 23, 2025

Announcements

HW 01 due Thursday, January 30 at 11:59pm
- Released after class.
- Make sure you are a member of the course GitHub organization
  - If you can see the number of people in the org, then you are a member!

Topics

Geometric interpretation of least-squares regression

Recap: Regression in matrix from

The simple linear regression model can be represented using vectors and matrices as

$y = X β + ϵ$

$y$ : Vector of responses
$X$ : Design matrix (columns for predictors + intercept)
$β$ : Vector of model coefficients
$ϵ$ : Vector of error terms centered at $0$ with variance $σ_{ϵ}^{2} I$

Recap: Derive $\hat{β}$

We used matrix calculus to derive the estimator $\hat{β}$ that minimizes $ϵ^{T} ϵ$

$\hat{β} = (X^{T} X)^{- 1} X^{T} y$

. . .

Now let’s consider how to derive the least-squares estimator using a geometric interpretation of regression

Geometry of least-squares regression

Let $Col (X)$ be the column space of $X$ : the set all possible linear combinations (span) of the columns of $X$
The vector of responses $y$ is not in $Col (X)$ .
Goal: Find another vector $z = Xb$ that is in $Col (X)$ and is as close as possible to $y$ .
- $z$ is a projection of $y$ onto $Col (X)$ .

Geometry of least-squares regression

For any $z = Xb$ in $Col (X)$ , the vector $e = y - Xb$ is the difference between $y$ and $Xb$ .
- We want to find $b$ such that $z = Xb$ is as close as possible to $y$ , i.e, we want to minimize the difference $e = y - Xb$
This distance is minimized when $e$ is orthogonal to $Col (X)$

Geometry of least-squares regression

Note: If $A$ , an $n \times k$ matrix, is orthogonal to an $n \times 1$ vector $c$ , then $A^{T} c = 0$
Therefore, we have $X^{T} e = 0$ , and thus

$X^{T} (y - Xb) = 0$

Solve for $b$ .

Hat matrix

Recall the hat matrix $H = X (X^{T} X)^{- 1} X^{T}$ .
$\hat{y} = Hy$ , so $H$ is a projection of $y$ onto $Xb$
Properties of $H$ , a projection matrix
- $H$ is symmetric ( $H^{T} = H$ )
- $H$ is idempotent ( $H^{2} = H$ )
- If $v$ in $Col (X)$ , then $Hv = v$
- If $v$ is orthogonal to $Col (X)$ , then $Hv = 0$