Exam 02 practice

Important

This page contains practice problems to help prepare for Exam 02. This set of practice problems is not comprehensive. You should review these study tips as you prepare for the exam.

There is no answer key for these problems. You may ask questions in office hours and on Ed Discussion.

Maximum likelihood estimation

Exercise 1

Given the simple linear regression model

yi=β0+β1xi+ϵi,ϵiN(0,σϵ2)

Write the likelihood function and use it to show that the maximum likelihood estimators (MLEs) of β0, β1, and σϵ2 are of the form shown on this slide (β~0) and this slide (β~1,σ~ϵ2) .

Exercise 2

Given the linear regression model

y=Xβ+ϵϵN(0,σϵ2I)

Write the likelihood function and use it to show that the maximum likelihood estimators (MLEs) of β and σϵ2 are

β~=(XTX)1XTyσ~ϵ2=1n(yXβ~)T(yXβ~)

Exercise 3

Given the logistic regression model

log(π1π)=Xβ

  • Write the likelihood function

  • Rework the derivation from the March 27 lecture to show the derivative solved to find the MLEs is of the form on this slide. (You can check your answer using the board work posted in Canvas).

Exercise 4

Suppose Y1,,Yn are an independent and identically distributed (iid) sample from some distribution

fY(y)=θ(1θ)y1

such that y takes on positive integer values and 0<θ<1. Show that the MLE for θ1 is 1ni=1nyi .

Exercise 5

Rework Exercises 1 - 2 in HW 04.

Multiple linear regression (diagnostics, multicollinearity, variable transformations, comparison)

Exercise 6

Suppose we fit a linear model with a log transformation on the response variable, i.e.,

log(yi)^=β^0+β^1x1++β^pxp

  • Show mathematically why the slope for xj and intercept are interpreted in terms of y as shown on this slide.

  • Show how y is expected to change if xj increases by t units.

Exercise 7

Suppose we fit a linear model with a log transformation on one predictor variable, i.e.,

y^i=β^0+β^1log(x1)++β^pxp

Show mathematically why the slope and intercept are interpreted as shown on this slide when x is multiplied by a factor C.

Exercise 8

Rework Exercise 1 in HW 03.

Exercise 9

Recall that for the linear regression, the variance of the estimated coefficients are the diagonal elements of Var(β^)=σ^ϵ2(XTX)1. One of the impacts of multicollinearity is that the model coefficients will have large variances. Explain why.

Exercise 10

Suppose you fit a simple linear regression model.

  • Draw a scatterplot that contains an observation with large leverage but low Cook’s distance.

  • Draw a scatterplot that contains an observation with large leverage and high Cook’s distance.

  • Draw a scatterplot that contains an observation with a large studentized residual.

Exercise 11

  • What is an advantage of examining a plot of studentized residuals vs. fitted values rather than using the raw residuals?

  • Explain what is measured by Cook’s distance. You don’t need to memorize the formula but rather describe what the formula is quantifying for each observation. Click here for the formula (slide also contains the solution).

Logistic regression

Exercise 12

Write the hypotheses being tested in the drop-in-deviance test output on this slide. Explain how each value in the table is computed.

Exercise 13

  • What is an advantage of using a drop-in-deviance test instead of AIC (or BIC) to compare regression models?

  • What is an advantage of using AIC (or BIC) instead of a drop-in-deviance test to compare regression models?

Exercise 14

  • On average, what fraction of people with an odds of 0.37 of defaulting on their credit card payment will in fact default?

  • Suppose an individual has a 16% chance of defaulting on their credit card payment. What are the odds they will default?

Exercise 15

Recall the model using age and education to predict odds of being high risk for heart disease.

  • Show mathematically why the interpretation for the slope for age in terms of the log-odds is in the form shown on this slide.

  • Show mathematically why the interpretation for the slope of age in terms of the odds is in the form shown on this slide.

Exercise 16

Recall the model using age and education to predict odds of being high risk for heart disease.

  • Show mathematically why the interpretation for the slope for education4 in terms of the log-odds is in the form shown on this slide.

  • Show mathematically why the interpretation for the slope of education4in terms of the odds is in the form shown on this slide.

Exercise 17

Explain why the slope of the logistic regression model is called the Adjusted Odds Ratio (or just Odds Ratio if there is one predictor).

Exercise 18

  • Draw an example of an ROC curve such that the AUC is about 0.55

  • Draw an example of an ROC curve such that the AUC is about 0.9.

  • Explain what each point on an ROC curve represents.

Relevant assignments and AEs

The following assignments and AEs cover Exam 02 content. Ask yourself “why” questions as your review your answers, process, and derivations on these assignments. It may also be helpful to explain your process to others.

  • HW 03, HW 04

  • Lab 05, Lab 06, Lab 07

  • AE 04, AE 05

Footnotes

  1. From Introduction to Statistical Learning.↩︎