STA 221 - Spring 2025 – Probabilities, odds, odds ratios

Method	Outcome	Model
Linear regression	Quantitative	$y = X β + ϵ$
Linear regression (transform Y)	Quantitative	$\log (y) = X β + ϵ$
Logistic regression	Binary	$\log (\frac{π}{1 - π}) = X β$

Concern about AI vs. age

Age	Not Concerned	Concerned
18-29	550	416
30-49	1898	1681
50-64	1398	1818
65+	1376	2013
Refused	23	28

Compare the odds for two groups

Age	Not Concerned	Concerned
18-29	550	416
30-49	1898	1681
50-64	1398	1818
65+	1376	2013
Refused	23	28

We want to compare concern about increased use of AI in daily life between individuals who are 18-29 years old to those who are 65+ years old

Compare the odds for two groups

Age	Not Concerned	Concerned
18-29	550	416
30-49	1898	1681
50-64	1398	1818
65+	1376	2013
Refused	23	28

We’ll use the odds to compare the two groups

$odds = \frac{P (success)}{P (failure)} = \frac{# of successes}{# of failures}$

Compare the odds for two groups

Age	Not Concerned	Concerned
18-29	550	416
30-49	1898	1681
50-64	1398	1818
65+	1376	2013
Refused	23	28

Odds of being concerned with increased use of AI in daily life for 18-29 year olds: $\frac{416}{550} = 0.756$
Odds of being concerned with increased use of AI in daily life for those who are 65+ years old: $\frac{2013}{1376} = 1.463$
Based on this, we see that individuals 65+ years old are more likely to be concerned about the increased use of AI in daily life than 18-29 year olds.

Odds ratio (OR)

Age	Not Concerned	Concerned
18-29	550	416
30-49	1898	1681
50-64	1398	1818
65+	1376	2013
Refused	23	28

Let’s summarize the relationship between the two groups. To do so, we’ll use the odds ratio (OR).

$O R = \frac{{odds}_{1}}{{odds}_{2}}$

OR: AI concern by age

Age	Not Concerned	Concerned
18-29	550	416
30-49	1898	1681
50-64	1398	1818
65+	1376	2013
Refused	23	28

$O R = \frac{{odds}_{18 - 29}}{{odds}_{65 +}} = \frac{0.756}{1.463} = 0.517$

The odds an 18-29 year old is concerned about increased use of AI in daily life are 0.517 times the odds a 65+ year old is concerned.

More natural interpretation

It’s more natural to interpret the odds ratio with a statement with the odds ratio greater than 1.
The odds a 65+ year old is concerned about increased use of AI in daily life are 1.934 (1/0.517) times the odds an 18-29 year old is concerned.

Code to make table

pew_data |>
  count(age_cat, ai_concern)

# A tibble: 10 × 3
   age_cat ai_concern     n
   <fct>   <fct>      <int>
 1 18-29   0            550
 2 18-29   1            416
 3 30-49   0           1898
 4 30-49   1           1681
 5 50-64   0           1398
 6 50-64   1           1818
 7 65+     0           1376
 8 65+     1           2013
 9 Refused 0             23
10 Refused 1             28

Code to make table

pew_data |>
  count(age_cat, ai_concern) |>
  pivot_wider(names_from = ai_concern, values_from = n)

# A tibble: 5 × 3
  age_cat   `0`   `1`
  <fct>   <int> <int>
1 18-29     550   416
2 30-49    1898  1681
3 50-64    1398  1818
4 65+      1376  2013
5 Refused    23    28

Code to make table

pew_data |>
  count(age_cat, ai_concern) |>
  pivot_wider(names_from = ai_concern, values_from = n) |>
  kable()

age_cat	0	1
18-29	550	416
30-49	1898	1681
50-64	1398	1818
65+	1376	2013
Refused	23	28

Code to make table

pew_data |>
  count(age_cat, ai_concern) |>
  pivot_wider(names_from = ai_concern, values_from = n) |>
  kable(col.names = c("Age", "Not concerned", "Concerned"))

Age	Not concerned	Concerned
18-29	550	416
30-49	1898	1681
50-64	1398	1818
65+	1376	2013
Refused	23	28