Sampling distribution of $\hat{β}$

A sampling distribution is the probability distribution of a statistic for a large number of random samples of size $n$ from a population
The sampling distribution of $\hat{β}$ is the probability distribution of the estimated coefficients if we repeatedly took samples of size $n$ and fit the regression model

$\hat{β} \sim N (β, σ_{ϵ}^{2} (X^{T} X)^{- 1})$

The estimated coefficients $\hat{β}$ are normally distributed with

$E (\hat{β}) = β V a r (\hat{β}) = σ_{ϵ}^{2} (X^{T} X)^{- 1}$

Hypothesis test for $β_{j}$

Hypothesis test for $β_{j}$ : Test statistic

term	estimate	std.error	statistic
(Intercept)	19.332	2.984	6.478
enrollment_th	0.780	0.110	7.074
typePublic	-13.226	3.153	-4.195

Test statistic: Number of standard errors the estimate is away from the null

$Test Statistic = \frac{Estimate - Null}{Standard error} = \frac{- 13.226 - 0}{3.153} = - 4.195$

This means the estimated slope of -13.226 is 4.195 standard errors below the hypothesized mean of 0.

Confidence interval for $β_{j}$

A plausible range of values for a population parameter is called a confidence interval
Using only a single point estimate is like fishing in a murky lake with a spear, and using a confidence interval is like fishing with a net
- We can throw a spear where we saw a fish but we will probably miss, if we toss a net in that area, we have a good chance of catching the fish
- Similarly, if we report a point estimate, we probably will not hit the exact population parameter, but if we report a range of plausible values we have a good shot at capturing the parameter

What “confidence” means

We will construct $C %$ confidence intervals.
- The confidence level impacts the width of the interval
“Confident” means if we were to take repeated samples of the same size as our data, fit regression lines using the same predictors, and calculate $C %$ Cs for the coefficient of $x_{j}$ , then $C %$ of those intervals will contain the true value of the coefficient $β_{j}$
Balance precision and accuracy when selecting a confidence level

95% CI for $β_{j}$ in R

tidy(exp_fit, conf.int = TRUE, conf.level = 0.95) |> 
  kable(digits = 3)

term	estimate	std.error	statistic	conf.low	conf.high
(Intercept)	19.332	2.984	6.478	13.426	25.239
enrollment_th	0.780	0.110	7.074	0.562	0.999
typePublic	-13.226	3.153	-4.195	-19.466	-6.986

Interpretation: We are 95% confident that for each additional 1,000 students enrolled, the institution’s expenditures on football will be greater by $562,000 to $999,000, on average, holding institution type constant.

Inference for regression

Announcements

Exam 01

Resources

Topics

Computing setup

Data: NCAA Football expenditures

Regression model

Inference for a single coefficient

Inference for $β_{j}$

Sampling distribution of $\hat{β}$

Sampling distribution of ${\hat{β}}_{j}$

Hypothesis test for $β_{j}$

Steps for a hypothesis test

Hypothesis test for $β_{j}$ : Hypotheses

Hypothesis test for $β_{j}$ : Test statistic

Hypothesis test for $β_{j}$ : p-value

Hypothesis test for $β_{j}$ : Conclusion

Confidence interval for $β_{j}$

Confidence interval for $β_{j}$

What “confidence” means

Confidence interval for $β_{j}$

Confidence interval: Critical value

95% CI for $β_{j}$ : Calculation

95% CI for $β_{j}$ in R

Application exercise

Recap

Next class