Multiple linear regression

Types of predictors

Prof. Maria Tackett

Jan 28, 2025

term	estimate	std.error	statistic	p.value
(Intercept)	10.726	1.507	7.116	0.000
debt_to_income	0.671	0.676	0.993	0.326
verified_incomeSource Verified	2.211	1.399	1.581	0.121
verified_incomeVerified	6.880	1.801	3.820	0.000
annual_income_th	-0.021	0.011	-1.804	0.078

Indicator variables

Suppose we want to predict the amount of sleep a Duke student gets based on whether they are in Pratt (Pratt Yes/ No are the only two options). Consider the model

$S l e e p_{i} = β_{0} + β_{1} 1 (P r a t t_{i} = Yes) + β_{2} 1 (P r a t t_{i} = No)$

Write out the design matrix for this hypothesized linear model.
Demonstrate that the design matrix is not of full column rank (that is, affirmatively provide one of the columns in terms of the others).
Use this intuition to explain why when we include categorical predictors, we cannot include both indicators for every level of the variable and an intercept.

Indicator variables in the model

Given a categorical predictor with $k$ levels…

Use $k - 1$ indicator variables in the model
The baseline is the category that doesn’t have a term in the model
- This is also called the reference level
The coefficients of the indicator variables in the model are interpreted as the expected change in the response compared to the baseline, holding all other variables constant.

term	estimate	std.error	statistic	p.value	conf.low	conf.high
(Intercept)	10.726	1.507	7.116	0.000	7.690	13.762
debt_to_income	0.671	0.676	0.993	0.326	-0.690	2.033
verified_incomeSource Verified	2.211	1.399	1.581	0.121	-0.606	5.028
verified_incomeVerified	6.880	1.801	3.820	0.000	3.253	10.508
annual_income_th	-0.021	0.011	-1.804	0.078	-0.043	0.002

Centering

Centering a quantitative predictor means shifting every value by some constant $C$
One common type of centering is mean-centering, in which every value of a predictor is shifted by its mean
Only quantitative predictors are centered
Center all quantitative predictors in the model for ease of interpretation

What is one reason one might want to center the quantitative predictors? What is are the units of centered variables?

term	estimate	std.error	statistic	p.value
(Intercept)	9.444	0.977	9.663	0.000
debt_to_inc_cent	0.671	0.676	0.993	0.326
verified_incomeSource Verified	2.211	1.399	1.581	0.121
verified_incomeVerified	6.880	1.801	3.820	0.000
annual_inc_cent	-0.021	0.011	-1.804	0.078

Term	Original Model	Centered Model
(Intercept)	10.726	9.444
debt_to_income	0.671	0.671
verified_incomeSource Verified	2.211	2.211
verified_incomeVerified	6.880	6.880
annual_income_th	-0.021	-0.021

term	estimate	std.error	statistic	p.value
(Intercept)	9.444	0.977	9.663	0.000
debt_to_inc_std	0.643	0.648	0.993	0.326
verified_incomeSource Verified	2.211	1.399	1.581	0.121
verified_incomeVerified	6.880	1.801	3.820	0.000
annual_inc_std	-1.180	0.654	-1.804	0.078

Term	Original Model	Standardized Model
(Intercept)	10.726	9.444
debt_to_income	0.671	0.643
verified_incomeSource Verified	2.211	2.211
verified_incomeVerified	6.880	6.880
annual_income_th	-0.021	-1.180

term	estimate	std.error	statistic	p.value
(Intercept)	9.560	2.034	4.700	0.000
debt_to_income	0.691	0.685	1.009	0.319
verified_incomeSource Verified	3.577	2.539	1.409	0.166
verified_incomeVerified	9.923	3.654	2.716	0.009
annual_income_th	-0.007	0.020	-0.341	0.735
verified_incomeSource Verified:annual_income_th	-0.016	0.026	-0.643	0.523
verified_incomeVerified:annual_income_th	-0.032	0.033	-0.979	0.333

1 / 30

Multiple linear regression Types of predictors Prof. Maria Tackett Jan 28, 2025

Multiple linear regression
Announcements
Topics
Computing setup
Data: Peer-to-peer lender
Variables
Response vs. predictors
Model fit in R
Categorical predictors
Matrix form of multiple linear regression
Indicator variables
Indicator variables
Indicator variables for verified_income
Indicator variables in the model
Application exercise
Interpreting verified_income
Centering
Centering
Centering
Standardizing
Standardizing
Standardizing
Interaction terms
Interaction terms
Interest rate vs. annual income
Application exercise
Interaction term in model
Interpreting interaction terms
Recap
Next class

Multiple linear regression

Announcements

Topics

Computing setup

Data: Peer-to-peer lender

Variables

Response vs. predictors

Model fit in R

Categorical predictors

Matrix form of multiple linear regression

Indicator variables

Indicator variables

Indicator variables for `verified_income`

Indicator variables in the model

Application exercise

Interpreting `verified_income`

Centering

Centering

Centering

Standardizing

Standardizing

Standardizing

Interaction terms

Interaction terms

Interest rate vs. annual income

Application exercise

Interaction term in model

Interpreting interaction terms

Recap

Next class