Mar 06, 2025
HW 03 due March 20 at 11:59pm
Next project milestone: Exploratory data analysis due March 20
The data set comes from Zarulli et al. (2021) who analyze the effects of a country’s healthcare expenditures and other factors on the country’s life expectancy. The data are originally from the Human Development Database and World Health Organization.
There are 140 countries (observations) in the data set.
life_exp
: The average number of years that a newborn could expect to live, if he or she were to pass through life exposed to the sex- and age-specific death rates prevailing at the time of his or her birth, for a specific year, in a given country, territory, or geographic income_inequality. ( from the World Health Organization)
income_inequality
: Measure of the deviation of the distribution of income among individuals or households within a country from a perfectly equal distribution. A value of 0 represents absolute equality, a value of 100 absolute inequality (based on Gini coefficient). (from Zarulli et al. (2021))
education
: Indicator of whether a country’s education index is above (High
) or below (Low
) the median index for the 140 countries in the data set.
health_expend
: Per capita current spending on on healthcare goods and services, expressed in respective currency - international Purchasing Power Parity (PPP) dollar (from the World Health Organization)
Let’s consider a model using a country’s healthcare expenditure, income inequality, and education to predict its life expectancy
term | estimate | std.error | statistic | p.value |
---|---|---|---|---|
(Intercept) | 78.575 | 1.775 | 44.274 | 0.000 |
health_expenditure | 0.001 | 0.000 | 4.522 | 0.000 |
income_inequality | -0.484 | 0.061 | -7.900 | 0.000 |
educationHigh | 2.020 | 1.168 | 1.730 | 0.086 |
Look at residuals vs. each predictor to determine which variable has non-linear relationship with life expectancy.
There is a non-linear relationship is between health expenditure and life expectancy.
Try a transformation on
When we fit a model with predictor
such that
The estimated regression model is
Intercept: When
Coefficient of
term | estimate | std.error | statistic | p.value |
---|---|---|---|---|
(Intercept) | 59.151 | 3.184 | 18.576 | 0.000 |
log(health_expenditure) | 3.092 | 0.396 | 7.814 | 0.000 |
income_inequality | -0.362 | 0.058 | -6.225 | 0.000 |
educationHigh | -0.168 | 1.103 | -0.152 | 0.879 |
Interpret the intercept in the context of the data.
Interpret the effect of health expenditure in the context of the data.
Interpret the effect of education in the context of the data.
Is a model with log-transformed response and/or predictor still a “linear” model?
What does it mean for a model to be a “linear” model?
Linear models are linear in the parameters, i.e. given an observation
The functions
See Log Transformations in Linear Regression for more details about interpreting regression models with log-transformed variables.
Please submit any questions you have about multicollinearity and variable transformations.