SLR: Matrix representation

Author

Prof. Maria Tackett

Published

Jan 21, 2025

Announcements

  • Lab 01 due on TODAY at 11:59pm

    • Push work to GitHub repo

    • Submit final PDF on Gradescope + mark pages for each question

  • HW 01 will be assigned on Thursday

Topics

  • Application exercise on model assessment
  • Matrix representation of simple linear regression
    • Model form
    • Least square estimate
    • Predicted (fitted) values
    • Residuals

Model assessment

Two statistics

  • Root mean square error, RMSE: A measure of the average error (average difference between observed and predicted values of the outcome)

    RMSE=i=1n(yiy^i)2n=i=1nei2n

  • R-squared, R2 : Percentage of variability in the outcome explained by the regression model (in the context of SLR, the predictor)

R2=SSMSST=1SSRSST

Application exercise

📋 sta221-sp25.netlify.app/ae/ae-01-model-assessment.html

Open ae-01 from last class. Complete Part 2.

Matrix representation of simple linear regression

SLR: Statistical model (population)

When we have a quantitative response, Y, and a single quantitative predictor, X, we can use a simple linear regression model to describe the relationship between Y and X.

Y=β0+β1X+ϵ


  • β1: Population (true) slope of the relationship between X and Y
  • β0: Population (true) intercept of the relationship between X and Y
  • ϵ: Error terms centered at 0 with variance σϵ2

SLR in matrix form

The simple linear regression model can be represented using vectors and matrices as

y=Xβ+ϵ

  • y : Vector of responses

  • X: Design matrix (columns for predictors + intercept)

  • β: Vector of model coefficients

  • ϵ: Vector of error terms centered at 0 with variance σϵ2I

SLR in matrix form

[y1yn]y=[1x11xn]X[β0β1]β+[ϵ1ϵn]ϵ


What are the dimensions of y, X, β, and ϵ?

Derive least squares estimator for β

Goal: Find estimator β^=[β^0β^1] that minimizes the sum of squared errors i=1nϵi2=ϵTϵ=(yXβ)T(yXβ)

Gradient

Let x=[x1x2xk]be a k×1 vector and f(x) be a function of x.

. . .

Then xf, the gradient of f with respect to x is

xf=[fx1fx2fxk]

Property 1

Let x be a k×1 vector and z be a k×1 vector, such that z is not a function of x .


The gradient of xTz with respect to x is

xxTz=z

Side note: Property 1

xTz=[x1x2xk][z1z2zk]=x1z1+x2z2++xkzk=i=1kxizi

Side note: Property 1

xxTz=[xTzx1xTzx2xTzxk]=[x1(x1z1+x2z2++xkzk)x2(x1z1+x2z2++xkzk)xk(x1z1+x2z2++xkzk)]=[z1z2zk]=z

Property 2

Let x be a k×1 vector and A be a k×k matrix, such that A is not a function of x .


Then the gradient of xTAx with respect to x is

xxTAx=(Ax+ATx)=(A+AT)x


If A is symmetric, then

(A+AT)x=2Ax

Proof in HW 01.

Derive least squares estimator

Find β^ that minimizes

ϵTϵ=(yXβ)T(yXβ)=(yTβTXT)(yXβ)=yTyyTXββTXTy+βTXTXβ=yTy2βTXTy+βTXTXβ

Derive least squares estimator

βϵTϵ=β(yTy2βTXTy+βTXTXβ)=2XTy+2XTXβ

Find β^ that satisfies

2XTy+2XTXβ^=0

β^=(XTX)1XTy

Did we find a minimum?

Hessian matrix

The Hessian matrix, x2f is a k×k matrix of partial second derivatives

x2f=[2fx122fx1x22fx1xk2f x2x12fx222fx2xk2fxkx12fxkx22fxk2]

Using the Hessian matrix

If the Hessian matrix is…

  • positive-definite, then we have found a minimum.

  • negative-definite, then we have found a maximum.

  • neither positive or negative-definite, then we have found a saddle point

Did we find a minimum?

β2ϵTϵ=β(2XTy+2XTXβ)=2β(XTy)+2β(XTXβ)XTX

Show that XTX is positive definite in HW 01.

Predicted values and residuals

Predicted (fitted) values

Now that we have β^, let’s predict values of y using the model

y^=Xβ^=X(XTX)1XTHy=Hy

. . .

Hat matrix: H=X(XTX)1XT

. . .

  • H is an n×n matrix
  • Maps vector of observed values y to a vector of fitted values y^
  • It is only a function of X not y

Residuals

Recall that the residuals are the difference between the observed and predicted values

e=yy^=yXβ^=yHye=(IH)y

Recap

  • Introduced matrix representation for simple linear regression

    • Model form
    • Least square estimate
    • Predicted (fitted) values
    • Residuals

For next class