Exam 02 practice

Important

This page contains practice problems to help prepare for Exam 02. This set of practice problems is not comprehensive. You should review these study tips as you prepare for the exam.

There is no answer key for these problems. You may ask questions in office hours and on Ed Discussion.

Maximum likelihood estimation

Exercise 1

Given the simple linear regression model

$y_{i} = β_{0} + β_{1} x_{i} + ϵ_{i}, ϵ_{i} \sim N (0, σ_{ϵ}^{2})$

Write the likelihood function and use it to show that the maximum likelihood estimators (MLEs) of $β_{0}$ , $β_{1}$ , and $σ_{ϵ}^{2}$ are of the form shown on this slide $({\tilde{β}}_{0})$ and this slide $({\tilde{β}}_{1}, {\tilde{σ}}_{ϵ}^{2})$ .

Exercise 2

Given the linear regression model

$y = X β + ϵ ϵ \sim N (0, σ_{ϵ}^{2} I)$

Write the likelihood function and use it to show that the maximum likelihood estimators (MLEs) of $β$ and $σ_{ϵ}^{2}$ are

$\tilde{β} = (X^{T} X)^{- 1} X^{T} y {\tilde{σ}}_{ϵ}^{2} = \frac{1}{n} (y - X \tilde{β})^{T} (y - X \tilde{β})$

Exercise 3

Given the logistic regression model

$\log (\frac{π}{1 - π}) = X β$

Write the likelihood function
Rework the derivation from the March 27 lecture to show the derivative solved to find the MLEs is of the form on this slide. (You can check your answer using the board work posted in Canvas).

Exercise 4

Suppose $Y_{1}, \dots, Y_{n}$ are an independent and identically distributed (iid) sample from some distribution

$f_{Y} (y) = θ (1 - θ)^{y - 1}$

such that $y$ takes on positive integer values and $0 < θ < 1$ . Show that the MLE for $θ^{- 1}$ is $\frac{1}{n} \sum_{i = 1}^{n} y_{i}$ .

Exercise 5

Rework Exercises 1 - 2 in HW 04.

Multiple linear regression (diagnostics, multicollinearity, variable transformations, comparison)

Exercise 6

Suppose we fit a linear model with a log transformation on the response variable, i.e.,

$\hat{\log (y_{i})} = {\hat{β}}_{0} + {\hat{β}}_{1} x_{1} + \dots + {\hat{β}}_{p} x_{p}$

Show mathematically why the slope for $x_{j}$ and intercept are interpreted in terms of $y$ as shown on this slide.
Show how $y$ is expected to change if $x_{j}$ increases by $t$ units.

Exercise 7

Suppose we fit a linear model with a log transformation on one predictor variable, i.e.,

${\hat{y}}_{i} = {\hat{β}}_{0} + {\hat{β}}_{1} \log (x_{1}) + \dots + {\hat{β}}_{p} x_{p}$

Show mathematically why the slope and intercept are interpreted as shown on this slide when $x$ is multiplied by a factor $C$ .

Exercise 8

Rework Exercise 1 in HW 03.

Exercise 9

Recall that for the linear regression, the variance of the estimated coefficients are the diagonal elements of $V a r (\hat{β}) = {\hat{σ}}_{ϵ}^{2} (X^{T} X)^{- 1}$ . One of the impacts of multicollinearity is that the model coefficients will have large variances. Explain why.

Exercise 10

Suppose you fit a simple linear regression model.

Draw a scatterplot that contains an observation with large leverage but low Cook’s distance.
Draw a scatterplot that contains an observation with large leverage and high Cook’s distance.
Draw a scatterplot that contains an observation with a large studentized residual.

Exercise 11

What is an advantage of examining a plot of studentized residuals vs. fitted values rather than using the raw residuals?
Explain what is measured by Cook’s distance. You don’t need to memorize the formula but rather describe what the formula is quantifying for each observation. Click here for the formula (slide also contains the solution).

Logistic regression

Exercise 12

Write the hypotheses being tested in the drop-in-deviance test output on this slide. Explain how each value in the table is computed.

Exercise 13

What is an advantage of using a drop-in-deviance test instead of AIC (or BIC) to compare regression models?
What is an advantage of using AIC (or BIC) instead of a drop-in-deviance test to compare regression models?

Exercise 14¹

On average, what fraction of people with an odds of 0.37 of defaulting on their credit card payment will in fact default?
Suppose an individual has a 16% chance of defaulting on their credit card payment. What are the odds they will default?

Exercise 15

Recall the model using age and education to predict odds of being high risk for heart disease.

Show mathematically why the interpretation for the slope for age in terms of the log-odds is in the form shown on this slide.
Show mathematically why the interpretation for the slope of age in terms of the odds is in the form shown on this slide.

Exercise 16