Logistic Regression: Inference

Author

Prof. Maria Tackett

Published

Apr 08, 2025

Announcements

Lab 07 due TODAY at 11:59pm
Team Feedback (email from TEAMMATES) due TODAY at 11:59pm (check email)
HW 04 due April 10 at 11:59pm
Next project milestone: Draft and peer review in Friday’s lab
Exam 02 - April 17 (same format as Exam 01)
- Exam 02 practice + lecture recordings available
Statistics experience due April 22

Questions from this week’s content?

Topics

Test of significance for a subset of predictors
Inference for a single predictor

Computational setup

library(tidyverse)
library(tidymodels)
library(pROC)      
library(knitr)
library(kableExtra)

# set default theme in ggplot2
ggplot2::theme_set(ggplot2::theme_bw())

Risk of coronary heart disease

This data set is from an ongoing cardiovascular study on residents of the town of Framingham, Massachusetts. We want to examine the relationship between various health characteristics and the risk of having heart disease.

high_risk:
- 1: High risk of having heart disease in next 10 years
- 0: Not high risk of having heart disease in next 10 years
age: Age at exam time (in years)
totChol: Total cholesterol (in mg/dL)
currentSmoker: 0 = nonsmoker, 1 = smoker
education: 1 = Some High School, 2 = High School or GED, 3 = Some College or Vocational School, 4 = College

Modeling risk of coronary heart disease

Using age, totChol, and currentSmoker

term	estimate	std.error	statistic	p.value	conf.low	conf.high
(Intercept)	-6.673	0.378	-17.647	0.000	-7.423	-5.940
age	0.082	0.006	14.344	0.000	0.071	0.094
totChol	0.002	0.001	1.940	0.052	0.000	0.004
currentSmoker1	0.443	0.094	4.733	0.000	0.260	0.627

Drop-in-deviance test

We will use a drop-in-deviance test (Likelihood Ratio Test) to test

the overall statistical significance of a logistic regression model
the statistical significance of a subset of coefficients in the model

Deviance

The deviance is a measure of the degree to which the predicted values are different from the observed values (compares the current model to a “saturated” model)

In logistic regression,

$D = - 2 \log L$

$D \sim χ_{n - p - 1}^{2}$ ( $D$ follows a Chi-square distribution with $n - p - 1$ degrees of freedom)¹

Note: $n - p - 1$ a the degrees of freedom associated with the error in the model (like residuals)

$χ^{2}$ distribution

Test for overall significance

We can test the overall significance for a logistic regression model, i.e., whether there is at least one predictor with a non-zero coefficient

$\begin{aligned} H_{0} : β_{1} = \dots = β_{p} = 0 \\ H_{a} : β_{j} \neq 0 for at least one j \end{aligned}$

. . .

The drop-in-deviance test for overall significance compares the fit of a model with no predictors to the current model.

Drop-in-deviance test statistic

Let $L_{0}$ and $L_{a}$ be the likelihood functions of the model under $H_{0}$ and $H_{a}$ , respectively. The test statistic is

$\begin{aligned} G = D_{0} - D_{a} & = (- 2 \log L_{0}) - (- 2 \log L_{a}) \\ = - 2 (\log L_{0} - \log L_{a}) \\ = - 2 \sum_{i = 1}^{n} [y_{i} \log (\frac{{\hat{π}}^{0}}{{\hat{π}}_{i}^{a}}) + (1 - y_{i}) \log (\frac{1 - {\hat{π}}^{0}}{1 - {\hat{π}}_{i}^{a}})] \end{aligned}$

where ${\hat{π}}^{0}$ is the predicted probability under $H_{0}$ and ${\hat{π}}_{i}^{a} = \frac{\exp {x_{i}^{T} β}}{1 + \exp {x_{i}^{T} β}}$ is the predicted probability under $H_{a}$

Drop-in-deviance test statistic

$G = - 2 \sum_{i = 1}^{n} [y_{i} \log (\frac{{\hat{π}}^{0}}{{\hat{π}}_{i}^{a}}) + (1 - y_{i}) \log (\frac{1 - {\hat{π}}^{0}}{1 - {\hat{π}}_{i}^{a}})]$

. . .

When $n$ is large, $G \sim χ_{p}^{2}$ , ( $G$ follows a Chi-square distribution with $p$ degrees of freedom)²
The p-value is calculated as $P (χ^{2} > G)$
Large values of $G$ (small p-values) indicate at least one $β_{j}$ is non-zero

Heart disease model: drop-in-deviance test

$\begin{aligned} H_{0} : β_{a g e} = β_{t o t C h o l} = β_{c u r r e n t S m o k e r} = 0 \\ H_{a} : β_{j} \neq 0 for at least one j \end{aligned}$

. . .

Fit the null model (we’ve already fit the alternative model)

null_model <- glm(high_risk ~ 1, data = heart_disease, family = "binomial")

term	estimate	std.error	statistic	p.value
(Intercept)	-1.72294	0.0436342	-39.486	0

Heart disease model: drop-in-deviance test

Calculate the log-likelihood for the null and alternative models

(L_0 <- glance(null_model)$logLik)

[1] -1737.735

(L_a <- glance(high_risk_fit)$logLik)

[1] -1612.406

. . .

Calculate the likelihood ratio test statistic

(G <- -2 * (L_0 - L_a))

[1] 250.6572

. . .

Heart disease model: likelihood ratio test

Calculate the p-value

(p_value <- pchisq(G, df = 3, lower.tail = FALSE))

[1] 4.717158e-54

. . .

Conclusion

The p-value is small, so we reject $H_{0}$ . The data provide evidence that at least one predictor in the model has a non-zero coefficient.

Why use overall test?

Why do we use a test for overall significance instead of just looking at the test for individual coefficients?³

. . .

Suppose we have a model such that $p = 100$ and $H_{0} : β_{1} = \dots = β_{100} = 0$ is true

. . .

About 5% of the p-values for individual coefficients will be below 0.05 by chance.
So we expect to see 5 small p-values if even no linear association actually exists.
Therefore, it is very likely we will see at least one small p-value by chance.
The overall test of significance does not have this problem. There is only a 5% chance we will get a p-value below 0.05, if a relationship truly does not exist.

Test a subset of coefficients

Testing a subset of coefficients

Suppose there are two models:
- Reduced Model: includes predictors $x_{1}, \dots, x_{q}$
- Full Model: includes predictors $x_{1}, \dots, x_{q}, x_{q + 1}, \dots, x_{p}$
We can use a drop-in-deviance test to determine if any of the new predictors are useful

. . .

$\begin{aligned} H_{0} : β_{q + 1} = \dots = β_{p} = 0 \\ H_{a} : β_{j} \neq 0 for at least one j \end{aligned}$

Drop-in-deviance test

$\begin{aligned} H_{0} : β_{q + 1} = \dots = β_{p} = 0 \\ H_{a} : β_{j} \neq 0 for at least one j \end{aligned}$

. . .

The test statistic is

$\begin{aligned} G = D_{r e d u c e d} - D_{f u l l} & = (- 2 \log L_{r e d u c e d}) - (- 2 \log L_{f u l l}) \\ = - 2 (\log L_{r e d u c e d} - \log L_{f u l l}) \end{aligned}$

. . .

The p-value is calculated using a $χ_{Δ d f}^{2}$ distribution, where $Δ d f$ is the number of parameters being tested (the difference in number of parameters between the full and reduced model).⁴

Example: Include `education`?

Should we include education in the model?

Reduced model: age, totChol, currentSmoker
Full model: age, totChol, currentSmoker , education

. . .

$\begin{aligned} H_{0} : β_{e d 2} = β_{e d 3} = β_{e d 4} = 0 \\ H_{a} : β_{j} \neq 0 for at least one j \end{aligned}$

Example: Include `education`?

reduced_model <- glm(high_risk ~ age + totChol + currentSmoker, 
              data = heart_disease, family = "binomial")

full_model <- glm(high_risk ~ age + totChol + currentSmoker + education, 
              data = heart_disease, family = "binomial")

. . .

Calculate deviances

(deviance_reduced <- -2 * glance(reduced_model)$logLik)

[1] 3224.812

(deviance_full <- -2 * glance(full_model)$logLik)

[1] 3217.6

. . .

Calculate test statistic

(G <- deviance_reduced - deviance_full)

[1] 7.212113

Example: Include `education`?

Calculate p-value

pchisq(G, df = 3, lower.tail = FALSE)

[1] 0.06543567

. . .

What is your conclusion? Would you include education in the model that already has age, totChol, currentSmoker?

Drop-in-deviance test in R

Conduct the drop-in-deviance test using the anova() function in R with option test = "Chisq"

anova(reduced_model, full_model, test = "Chisq") |> 
  tidy() |> 
  kable(digits = 3)

term	df.residual	residual.deviance	df	deviance	p.value
high_risk ~ age + totChol + currentSmoker	4082	3224.812	NA	NA	NA
high_risk ~ age + totChol + currentSmoker + education	4079	3217.600	3	7.212	0.065

Add interactions with `currentSmoker`?

term	df.residual	residual.deviance	df	deviance	p.value
high_risk ~ age + totChol + currentSmoker	4082	3224.812	NA	NA	NA
high_risk ~ age + totChol + currentSmoker + currentSmoker * age + currentSmoker * totChol	4080	3222.377	2	2.435	0.296

Test for a single coefficient

Distribution of $\hat{β}$

When $n$ is large, $\hat{β}$ , the estimated coefficients of the logistic regression model, is approximately normal.

How do we know the distribution of $\hat{β}$ is normal for large $n$ ?

Distribution of $\hat{β}$

When $n$ is large…

The expected value of $\hat{β}$ is the true parameter, $β$ , i.e., $E (\hat{β}) = β$

. . .

$V a r (\hat{β})$ , the matrix of variances and covariances between estimators

$V a r (\hat{β}) = (X^{T} V X)^{- 1}$

where $V$ is a $n \times n$ diagonal matrix, such that $V_{i i}$ is the estimated variance for the $i^{t h}$ observation

Test for a single coefficient

Hypotheses: $H_{0} : β_{j} = 0 vs H_{a} : β_{j} \neq 0$ , given the other variables in the model

. . .

(Wald) Test Statistic: $z = \frac{{\hat{β}}_{j} - 0}{S E ({\hat{β}}_{j})}$

where $S E ({\hat{β}}_{j})$ is the square root of the $j^{t h}$ diagonal element of $V a r (\hat{β})$

. . .

P-value: $P (| Z | > | z |)$ , where $Z \sim N (0, 1)$ , the Standard Normal distribution

Confidence interval for $β_{j}$

We can calculate the C% confidence interval for $β_{j}$ as the following:

${\hat{β}}_{j} \pm z^{*} \times S E ({\hat{β}}_{j})$

where $z^{*}$ is calculated from the $N (0, 1)$ distribution

. . .

Note

This is an interval for the change in the log-odds for every one unit increase in $x_{j}$

Interpretation in terms of the odds

The change in odds for every one unit increase in $x_{j}$ .

$\exp {{\hat{β}}_{j} \pm z^{*} \times S E ({\hat{β}}_{j})}$

. . .

Interpretation: We are $C %$ confident that for every one unit increase in $x_{j}$ , the odds multiply by a factor of $\exp {{\hat{β}}_{j} - z^{*} \times S E ({\hat{β}}_{j})}$ to $\exp {{\hat{β}}_{j} + z^{*} \times S E ({\hat{β}}_{j})}$ , holding all else constant.

Coefficient for `age`

term	estimate	std.error	statistic	p.value	conf.low	conf.high
(Intercept)	-6.673	0.378	-17.647	0.000	-7.423	-5.940
age	0.082	0.006	14.344	0.000	0.071	0.094
totChol	0.002	0.001	1.940	0.052	0.000	0.004
currentSmoker1	0.443	0.094	4.733	0.000	0.260	0.627

. . .

Hypotheses:

$H_{0} : β_{a g e} = 0 vs H_{a} : β_{a g e} \neq 0$ , given total cholesterol and smoking status are in the model.

Coefficient for `age`

term	estimate	std.error	statistic	p.value	conf.low	conf.high
(Intercept)	-6.673	0.378	-17.647	0.000	-7.423	-5.940
age	0.082	0.006	14.344	0.000	0.071	0.094
totChol	0.002	0.001	1.940	0.052	0.000	0.004
currentSmoker1	0.443	0.094	4.733	0.000	0.260	0.627

Test statistic:

$z = \frac{0.0825 - 0}{0.00575} = 14.34$

Coefficient for `age`

term	estimate	std.error	statistic	p.value	conf.low	conf.high
(Intercept)	-6.673	0.378	-17.647	0.000	-7.423	-5.940
age	0.082	0.006	14.344	0.000	0.071	0.094
totChol	0.002	0.001	1.940	0.052	0.000	0.004
currentSmoker1	0.443	0.094	4.733	0.000	0.260	0.627

P-value:

$P (| Z | > | 14.34 |) \approx 0$

. . .

2 * pnorm(14.34,lower.tail = FALSE)

[1] 1.230554e-46

Coefficient for `age`

term	estimate	std.error	statistic	p.value	conf.low	conf.high
(Intercept)	-6.673	0.378	-17.647	0.000	-7.423	-5.940
age	0.082	0.006	14.344	0.000	0.071	0.094
totChol	0.002	0.001	1.940	0.052	0.000	0.004
currentSmoker1	0.443	0.094	4.733	0.000	0.260	0.627

Conclusion:

The p-value is very small, so we reject $H_{0}$ . The data provide sufficient evidence that age is a statistically significant predictor of whether someone is high risk of having heart disease, after accounting for total cholesterol and smoking status.

CI for `age`

term	estimate	std.error	statistic	p.value	conf.low	conf.high
(Intercept)	-6.673	0.378	-17.647	0.000	-7.423	-5.940
age	0.082	0.006	14.344	0.000	0.071	0.094
totChol	0.002	0.001	1.940	0.052	0.000	0.004
currentSmoker1	0.443	0.094	4.733	0.000	0.260	0.627

Interpret the 95% confidence interval for age in terms of the odds of being high risk for heart disease.

Overview of testing coefficients

Test a single coefficient

Drop-in-deviance test
Wald hypothesis test and confidence interval

. . .

Test a subset of coefficients

Drop-in-deviance test

. . .

Can use AIC and BIC to compare models in both scenarios

Questions from this week’s content?

Recap

Introduced test of significance for a subset of predictors
Inference for a single predictor

References

Wilks, SS. 1935. “The Likelihood Test of Independence in Contingency Tables.” The Annals of Mathematical Statistics 6 (4): 190–96.

Footnotes

See Wilks (1935) for theoretical underpinnings↩︎
Based on Wilk’s Theorem (Wilks 1935)↩︎
Example from Introduction to Statistical Learning↩︎
Based on Wilk’s Theorem (Wilks 1935)↩︎

Announcements

Questions from this week’s content?

Topics

Computational setup

Risk of coronary heart disease

Modeling risk of coronary heart disease

Drop-in-deviance test

Drop-in-deviance test

Deviance

χ2 distribution

Test for overall significance

Drop-in-deviance test statistic

Drop-in-deviance test statistic

Heart disease model: drop-in-deviance test

Heart disease model: drop-in-deviance test

Heart disease model: likelihood ratio test

Why use overall test?

Test a subset of coefficients

Testing a subset of coefficients

Drop-in-deviance test

Example: Include education?

Example: Include education?

Example: Include education?

Drop-in-deviance test in R

Add interactions with currentSmoker?

Test for a single coefficient

Distribution of β^

Distribution of β^

Test for a single coefficient

Confidence interval for βj

Interpretation in terms of the odds

Coefficient for age

Coefficient for age

Coefficient for age

Coefficient for age

CI for age

Overview of testing coefficients

Questions from this week’s content?

Recap

References

Footnotes

$χ^{2}$ distribution

Example: Include `education`?

Example: Include `education`?

Example: Include `education`?

Add interactions with `currentSmoker`?

Distribution of $\hat{β}$

Distribution of $\hat{β}$

Confidence interval for $β_{j}$

Coefficient for `age`

Coefficient for `age`

Coefficient for `age`

Coefficient for `age`

CI for `age`