This page contains practice problems to help prepare for Exam 01. This set of practice problems is not comprehensive. You should review these study tips as you prepare for the exam.
There is no answer key for these problems. You may ask questions in office hours and on Ed Discussion.
Exercise 1
We will use data from nrow(penguins) penguins at Palmer Station in Antartica to fit linear regression model model using species (Adelie, Chinstrap, or Gentoo), flipper length (in millimeters), and bill depth (in millimeters) to predict its body mass (in grams). Click here to read more about the variables.
The linear regression model has the form
Write the dimensions of specifically for this analysis.
Exercise 2
The output for the model described in Exercise 1, along with 95% confidence intervals for the model coefficients, is shown below:
penguins_fit <-lm(body_mass_g ~ species + flipper_length_mm + bill_depth_mm + body_mass_g, data = penguins)
Warning in model.matrix.default(mt, mf, contrasts): the response appeared on
the right-hand side and was dropped
Warning in model.matrix.default(mt, mf, contrasts): problem with term 4 in
model.matrix: no columns are assigned
Interpret the coefficient of flipper_length_mm in the context of the data.
What is the baseline category for speices?
Interpret the coefficient of speciesChinstrap in the context of the data.
Exercise 3
Does the intercept have a meaningful interpretation?
If not, what are some strategies we can use to fit a model such that the intercept is meaningful?
Exercise 4
There are three species in the data set (Adelie, Chinstrap, Gentoo), but only two terms for species in the model. Use the design matrix to show why we cannot put indicators for all three species and the intercept in the model.
Exercise 5
We conduct the following hypothesis test for the coefficient of flipper_length_mm.
Null: There is no linear relationship between flipper length and body mass, after accounting for species and bill depth
Alternative: There is a linear relationship between flipper length and body mass, after accounting for species and bill depth
Write these hypotheses in mathematical notation.
The standard error is 3.098. Explain how this value is computed and what this value means in the context of the data.
The test statistic is 8.295. Explain how this value is computed and what this value means in the context of the data.
What distribution is used to compute the p-value?
What is the conclusion from the test in the context of the data?
Exercise 6
Interpret the 95% confidence interval for flipper_length_mm in the context of the data.
Is the interval consistent with the test from the previous exercise? Briefly explain.
Exercise 7
Sketch a scatterplot of the relationship between bill depth and body mass such that the effect of bill depth differs by species.
Exercise 8
When we conduct inference for regression, we assume the following distribution for
We conduct inference on the coefficients assuming that the variability of is constant for value (or combination) of predictors. Briefly explain why is assumption is important.
Exercise 10
Given the model , derive the least-squares estimator using matrix calculus.
Explain why we say “holding all else constant” when interpreting the coefficients in a multiple linear regression model.
Exercise 13
Suppose we have two models:
Model 1 includes predictors and
Model 2 includes predictors and
Explain why we should use and not to compare these models.
Exercise 14
Rework Exercises 1 - 5 in HW 01 for more practice with theory and math.
Exercise 15
Rework Exercises 1 - 5 in HW 02 for more practice with theory and math.
Relevant lectures, assignments and AEs
Ask yourself “why” questions as you the slides, review your answers, process, and derivations on these assignments. It may also be helpful to explain your process to others.
Lectures: January 9 - February 13 (February 13 lecture is an exam review)