Exam 01 practice

Important

This page contains practice problems to help prepare for Exam 01. This set of practice problems is not comprehensive. You should review these study tips as you prepare for the exam.

There is no answer key for these problems. You may ask questions in office hours and on Ed Discussion.

Exercise 1

We will use data from nrow(penguins) penguins at Palmer Station in Antartica to fit linear regression model model using species (Adelie, Chinstrap, or Gentoo), flipper length (in millimeters), and bill depth (in millimeters) to predict its body mass (in grams). Click here to read more about the variables.

The linear regression model has the form

y=Xβ+ϵWrite the dimensions of y,X,β,ϵ specifically for this analysis.

Exercise 2

The output for the model described in Exercise 1, along with 95% confidence intervals for the model coefficients, is shown below:

penguins_fit <- lm(body_mass_g ~ species + flipper_length_mm + 
                     bill_depth_mm + body_mass_g, 
                   data = penguins)
Warning in model.matrix.default(mt, mf, contrasts): the response appeared on
the right-hand side and was dropped
Warning in model.matrix.default(mt, mf, contrasts): problem with term 4 in
model.matrix: no columns are assigned
tidy(penguins_fit,conf.int = TRUE) |>
  kable(digits = 3)
term estimate std.error statistic p.value conf.low conf.high
(Intercept) -4526.887 516.931 -8.757 0.000 -5543.705 -3510.068
speciesChinstrap -131.968 51.400 -2.567 0.011 -233.073 -30.863
speciesGentoo 1288.968 132.774 9.708 0.000 1027.798 1550.138
flipper_length_mm 25.700 3.098 8.295 0.000 19.606 31.794
bill_depth_mm 182.364 18.358 9.934 0.000 146.252 218.475
  • Interpret the coefficient of flipper_length_mm in the context of the data.

  • What is the baseline category for speices?

  • Interpret the coefficient of speciesChinstrap in the context of the data.

Exercise 3

  • Does the intercept have a meaningful interpretation?

  • If not, what are some strategies we can use to fit a model such that the intercept is meaningful?

Exercise 4

There are three species in the data set (Adelie, Chinstrap, Gentoo), but only two terms for species in the model. Use the design matrix to show why we cannot put indicators for all three species and the intercept in the model.

Exercise 5

We conduct the following hypothesis test for the coefficient of flipper_length_mm.

  • Null: There is no linear relationship between flipper length and body mass, after accounting for species and bill depth

  • Alternative: There is a linear relationship between flipper length and body mass, after accounting for species and bill depth

  1. Write these hypotheses in mathematical notation.
  2. The standard error is 3.098. Explain how this value is computed and what this value means in the context of the data.
  3. The test statistic is 8.295. Explain how this value is computed and what this value means in the context of the data.
  4. What distribution is used to compute the p-value?
  5. What is the conclusion from the test in the context of the data?

Exercise 6

  1. Interpret the 95% confidence interval for flipper_length_mm in the context of the data.
  2. Is the interval consistent with the test from the previous exercise? Briefly explain.

Exercise 7

Sketch a scatterplot of the relationship between bill depth and body mass such that the effect of bill depth differs by species.

Exercise 8

When we conduct inference for regression, we assume the following distribution for y|X

y|X(Xβ,σϵ2I)

  1. Show that E(y|X)=Xβ
  2. Show that Var(y|X)=σϵ2I

See February 4 lecture “Inference for Regression” to check your work.

Exercise 9

We conduct inference on the coefficients β assuming that the variability of y|X is constant for value (or combination) of predictors. Briefly explain why is assumption is important.

Exercise 10

Given the model y=Xβ+ϵ, derive the least-squares estimator β^ using matrix calculus.

See January 21 lecture “SLR: Matrix representation” to check your work.

Exercise 11

Given the model y=Xβ+ϵ, derive the least-squares estimator β^ using the geometric interpretation of the model.

See January 23 lecture “Geometric interpretation of least-squares regression” to check your work.

Exercise 12

Explain why we say “holding all else constant” when interpreting the coefficients in a multiple linear regression model.

Exercise 13

Suppose we have two models:

  • Model 1 includes predictors X1 and X2

  • Model 2 includes predictors X1,X2,X3 and X4

Explain why we should use Adj.R2 and not R2 to compare these models.

Exercise 14

Rework Exercises 1 - 5 in HW 01 for more practice with theory and math.

Exercise 15

Rework Exercises 1 - 5 in HW 02 for more practice with theory and math.

Relevant lectures, assignments and AEs

Ask yourself “why” questions as you the slides, review your answers, process, and derivations on these assignments. It may also be helpful to explain your process to others.

  • Lectures: January 9 - February 13 (February 13 lecture is an exam review)

  • HW 01 - 02

  • Lab 01 - 04 (Lab 04 is an exam review)

  • AE 01 - 04 (AE 04 is an exam review)