Cont’d
Feb 11, 2025
Research topics due TODAY at 11:59pm on GitHub
HW 02 due Thursday at 11:59pm
Statistics experience due Tuesday, April 22
50 points total
In-class (35 -40 pts): 75 minutes during February 18 lecture
Take-home (10 -15 pts): released after class on Tuesday
If you miss any part of the exam for an excused absence (with academic dean’s note or other official documentation), your Exam 02 score will be counted twice
Prepare readings (see course schedule)
Lecture notes (use search bar to find specific topics)
AEs
Assignments
Today’s data come from Equity in Athletics Data Analysis and includes information about sports expenditures and revenues for colleges and universities in the United States. This data set was featured in a March 2022 Tidy Tuesday.
We will focus on the 2019 - 2020 season expenditures on football for institutions in the NCAA - Division 1 FBS. The variables are :
total_exp_m
: Total expenditures on football in the 2019 - 2020 academic year (in millions USD)
enrollment_th
: Total student enrollment in the 2019 - 2020 academic year (in thousands)
type
: institution type (Public or Private)
We often want to conduct inference on individual model coefficients
Hypothesis test: Is there a linear relationship between the response and
Confidence interval: What is a plausible range of values
A sampling distribution is the probability distribution of a statistic for a large number of random samples of size
The sampling distribution of
The estimated coefficients
Let
Let’s walk through the steps to test typePublic
.
Null: There is no linear relationship between institution type and football expenditure, after adjusting for enrollment
Alternative: There is a linear relationship between institution type and football expenditure, after adjusting for enrollment
term | estimate | std.error | statistic | p.value |
---|---|---|---|---|
(Intercept) | 19.332 | 2.984 | 6.478 | 0 |
enrollment_th | 0.780 | 0.110 | 7.074 | 0 |
typePublic | -13.226 | 3.153 | -4.195 | 0 |
Test statistic: Number of standard errors the estimate is away from the null
This means the estimated slope of -13.226 is 4.195 standard errors below the hypothesized mean of 0.
Given
The p-value is
The data provide sufficient evidence that
A plausible range of values for a population parameter is called a confidence interval
Using only a single point estimate is like fishing in a murky lake with a spear, and using a confidence interval is like fishing with a net
We can throw a spear where we saw a fish but we will probably miss, if we toss a net in that area, we have a good chance of catching the fish
Similarly, if we report a point estimate, we probably will not hit the exact population parameter, but if we report a range of plausible values we have a good shot at capturing the parameter
We will construct
“Confident” means if we were to take repeated samples of the same size as our data, fit regression lines using the same predictors, and calculate
Balance precision and accuracy when selecting a confidence level
where
term | estimate | std.error | statistic | p.value |
---|---|---|---|---|
(Intercept) | 19.332 | 2.984 | 6.478 | 0 |
enrollment_th | 0.780 | 0.110 | 7.074 | 0 |
typePublic | -13.226 | 3.153 | -4.195 | 0 |
term | estimate | std.error | statistic | p.value | conf.low | conf.high |
---|---|---|---|---|---|---|
(Intercept) | 19.332 | 2.984 | 6.478 | 0 | 13.426 | 25.239 |
enrollment_th | 0.780 | 0.110 | 7.074 | 0 | 0.562 | 0.999 |
typePublic | -13.226 | 3.153 | -4.195 | 0 | -19.466 | -6.986 |
Interpretation: We are 95% confident that for each additional 1,000 students enrolled, the institution’s expenditures on football will be greater by $562,000 to $999,000, on average, holding institution type constant.
Conducted hypothesis tests for a single coefficient
Computed and interpreted confidence intervals for a single coefficient