library(tidyverse)
library(knitr)
library(tidymodels)
library(rms) #calculate VIF
AE 04: Multicollinearity
Go to the course GitHub organization and locate your ae-04 repo to get started.
Render, commit, and push your responses to GitHub by the end of class to submit your AE.
Introduction
The Pioneer Valley Planning Commission (PVPC) collected data at the beginning a trail in Florence, MA for ninety days from April 5, 2005 to November 15, 2005. Data collectors set up a laser sensor, with breaks in the laser beam recording when a rail-trail user passed the data collection station. The data were collected from by Pioneer Valley Planning Commission via the mosaicData package.
We will use the following variables in this analysis:
Outcome:
volume
estimated number of trail users that day (number of breaks recorded)
Predictors
hightemp
daily high temperature (in degrees Fahrenheit)avgtemp
average of daily low and daily high temperature (in degrees Fahrenheit)season
one of “Fall”, “Spring”, or “Summer”precip
measure of precipitation (in inches)
<- read_csv("data/rail_trail.csv") rail_trail
Part 1
Exercise 1
Fit the regression model using high temperature, average temperature, season, and precipitation to predict volume.
Are there any coefficients that may be not what you expected?
Exercise 2
Use the formula
to calculate the VIF for avgtemp
.
Exercise 3
Based on the VIF from the previous exercise, does avgtemp
have a linear dependency with one or more other predictors? Explain.
Exercise 4
Use the
vif
function to compute VIF for all the predictors in Exercise 1.Are there predictors with near-linear dependencies? If so, which ones?
Part 2
Exercise 5
Let’s address the issue of multicollinearity. Choose a strategy to address the multicollinearity. Apply it, then use relevant statistics to select a final model.
Submission
To submit the AE:
Render the document to produce the PDF with all of your work from today’s class.
Push all your work to your AE repo on GitHub. You’re done! 🎉