Maximum likelihood estimation

Prof. Maria Tackett

Mar 18, 2025

Announcements

  • HW 03 due March 20 at 11:59pm

  • Project exploratory data analysis due March 20 at 11:59pm

    • Next milestone: Project presentations in lab March 28
  • Statistics experience due April 22

Topics

  • Likelihood

  • Maximum likelihood estimation (MLE)

  • MLE for linear regression

Motivation

  • We’ve discussed how to find the estimators of β and σϵ2 for the model

y=Xβ+ϵ,ϵ∼N(0,σϵ2I)using least-squares estimation

  • Today we will introduce another way to find these estimators - maximum likelihood estimation.

  • We will see the least-squares estimator is equal to the maximum likelihood estimator when certain assumptions hold

Maximum likelihood estimation

Example: Basketball shots

Suppose the a basketball player shoots the ball, such that the probability of making the basket (successfully making the shot) is p

  • What is the probability distribution for this random phenomenon?

  • Suppose the probability is p=0.5. What is the probability the player makes a single basket, given this value of p?

  • Suppose the probability is p=0.8. What is the probability the player makes a single basket, given this value of p?

Shooting the ball three times

Suppose the player shoots the ball three times. They are all independent and the player has the same probability p of making each basket.

Let B represent a made basket, and M represent a missed basket. The player shoots the ball three times with the outcome BBM.

  • Suppose the probability is p=0.5. What is the probability of observing the data BBM, given this value of p?

  • Suppose the probability is p=0.3. What is the probability of observing the data BBM, given this value of p ?

Shooting the ball three times

Suppose the player shoots the ball three times. They are all independent and the player has the same probability p of making each basket.

The player shoots the ball three times with the outcome BBM.

  • New question: What parameter value of p do you think maximizes the probability of observing this data?

  • We will use a likelihood function to answer this question.

Likelihood

  • A likelihood function is a measure of how likely we are to observe our data under each possible value of the parameter(s)

  • Note that this is not the same as the probability function.

  • Probability function: Fixed parameter value(s) + input possible outcomes

    • Given p=0.8 , what is the probability of observing BBM in three basketball shots?
  • Likelihood function: Fixed data + input possible parameter values

    • Given we’ve observed BBM, what is the most plausible value of p?

Likelihood: Three basketball shots

The likelihood function for the probability of a basket p given we observed BBM when shooting the ball three independent times L(p|BBM)=p×p×(1−p)


Thus, if the likelihood for p=0.8 is

L(p=0.8|BBM)=0.8×0.8×(1−0.8)=0.128

Likelihood: Three basketball shots


  • What is the general formula for the likelihood function for p given the observed data BBM?

  • How does assuming independence simplify things?

  • How does having identically distributed data simplify things?

Likelihood: Three basketball shots

The likelihood function for p given the data BBM is

L(p|BBM)=p×p×(1−p)=p2×(1−p)

  • We want of the value of p that maximizes this likelihood function, i.e., the value of p that is most likely given the observed data.

  • The process of finding this value is maximum likelihood estimation.

  • There are three primary ways to find the maximum likelihood estimator

    • Approximate using a graph

    • Using calculus

    • Numerical approximation

Finding the MLE using graphs

What do you think is the approximate value of the MLE of p given the data?

Finding the MLE using calculus

  • Find the MLE using the first derivative of the likelihood function.
  • This can be tricky because of the product rule, so we can maximize the log(Likelihood) instead. The same value maximizes the likelihood and log(Likelihood).

Use calculus to find the MLE of p given the data BBM.

Shooting the ball n times

Suppose the player shoots the ball n times. They are all independent and the player has the same probability p of making each one.

Suppose the player makes k baskets out of the n shots. This is the observed data.

  • What is the formula for the probability distribution to describe this random phenomenon?
  • What is the formula for the likelihood function for p given the observed data?
  • For what value of p do we maximize the likelihood given the observed data? Use calculus to find the response.

MLE in linear regression

Why maximum likelihood estimation?

  • “Maximum likelihood estimation is, by far, the most popular technique for deriving estimators.” (Casella and Berger 2024, 315)

  • MLEs have nice statistical properties (more on this next class)

    • Consistent

    • Efficient

    • Asymptotically normal

Note

If the normality assumption holds, the least squares estimator is the maximum likelihood estimator for β. Therefore, it has all the properties of the MLE.

Linear regression

Recall the linear model

y=Xβ+ϵ,ϵ∼N(0,σϵ2I)

  • We have discussed least-squares estimation to find β^ and σ^ϵ2
  • We have used the fact that β^∼N(β,σϵ2(XTX)−1) when doing hypothesis testing and confidence intervals.
  • Now we will discuss how we know β^ is normally distributed, as we introduce MLE for linear regression

Simple linear regression model

Suppose we have the simple linear regression (SLR) model

yi=β0+β1xi+ϵi,ϵi∼N(0,σϵ2)

such that ϵi are independently and identically distributed.


We can write this model in the form below and use this to find the MLE

yi|xi∼N(β0+β1xi,σϵ2)

Side note: Normal distribution

Let Z be a random variable, such that Z∼N(μ,σ2). Then the probability function is

P(Z=z|μ,σ2)=12πσ2exp⁡{−12σ2(z−μ)2}

SLR: Likelihood for β0,β1,σϵ2

The likelihood function for β0,β1,σϵ2 is

L(β0,β1,σϵ2|x1,…,xn,y1,…,yn)=p(y1|x1,β0,β1,σϵ2)…p(yn|xn,β0,β1,σϵ2)=∏i=1np(yi|xi,β0,β1,σϵ2)=∏i=1n12πσϵ2exp⁡{−12σϵ2(yi−[β0+β1xi])2}=(2πσϵ2)−n2exp⁡{−12σϵ2∑i=1n(yi−β0−β1xi)2}

Log-Likelihood for β0,β1,σϵ2

The log-likelihood function for β0,β1,σϵ2 is

logL(β0,β1,σϵ2|x1,…,xn,y1,…,yn)=log⁡((2πσϵ2)−n2exp⁡{−12σϵ2∑i=1n(yi−β0−β1xi)2})=−n2log⁡(2πσϵ2)−12σϵ2∑i=1n(yi−β0−β1xi)2

MLE for β0

1️⃣ Take derivative of log⁡L with respect to β0 and set it equal to 0

∂log⁡L∂β0=−22σϵ2∑i=1n(yi−β0−β1xi)(−1)=0

MLE for β0

2️⃣ Find the β~0 that satisfies the equality on the previous slide

After a few steps…

⇒∑i=1nyi−nβ~0−β~1∑i=1nxi=0⇒∑i=1nyi−β~1∑i=1nxi=nβ~0⇒1n∑i=1nyi−1nβ~1∑i=1nxi=β~0

MLE for β0

3️⃣ We can use the second derivative to show we’ve found the maximum

∂2log⁡L∂β02=−n2σ~ϵ2<0


Therefore, we have found the maximum. Thus, MLE for β0 is

β~0=y¯−β~1x¯

Note that β~0 is equal to β^0, the least-squares estimate

MLE for β1 and σϵ2

We can use a similar process to find the MLEs for β1 and σϵ2

β~1=∑i=1n(yi−y¯)(xi−x¯)∑i=1n(xi−x¯)2

σ~ϵ2=∑i=1n(yi−β~0−β~1xi)2n=∑i=1nei2n

Note: β~1=β^1 and σ~ϵ2≈σ^ϵ2

MLE in matrix form

MLE for linear regression in matrix form

L(β,σϵ2|X,y)=1(2π)n/2σϵnexp⁡{−12σϵ2(y−Xβ)T(y−Xβ)}

log⁡L(β,σϵ2|X,y)=−n2log⁡(2π)−nlog⁡(σϵ)−12σϵ2(y−Xβ)T(y−Xβ)

  1. For a fixed value of σϵ , we know that log⁡L is maximized when what is true about (y−Xβ)T(y−Xβ) ?
  2. What does this tell us about the relationship between the MLE and least-squares estimator for β?

Putting it all together

  • The MLE β~ is equivalent to the least-squares estimator β^ , when the errors follow independent and identical normal distributions

  • MLEs have nice properties, so this means the least-squares estimator β^ inherits all the nice properties of MLEs

  • The MLE σ~ϵ2 is approximately equal to the least-squares estimator σ^ϵ2. When n>>p, the difference is trivial

References

Casella, George, and Roger Berger. 2024. Statistical Inference. CRC Press.

🔗 STA 221 - Spring 2025

1 / 30
Maximum likelihood estimation Prof. Maria Tackett Mar 18, 2025

  1. Slides

  2. Tools

  3. Close
  • Maximum likelihood estimation
  • Announcements
  • Topics
  • Motivation
  • Maximum likelihood estimation
  • Example: Basketball shots
  • Shooting the ball three times
  • Shooting the ball three times
  • Likelihood
  • Likelihood: Three basketball shots
  • Likelihood: Three basketball shots
  • Likelihood: Three basketball shots
  • Finding the MLE using graphs
  • Finding the MLE using calculus
  • Shooting the ball n times
  • MLE in linear regression
  • Why maximum likelihood estimation?
  • Linear regression
  • Simple linear regression model
  • Side note: Normal distribution
  • SLR: Likelihood for β0,β1,σϵ2
  • Log-Likelihood for β0,β1,σϵ2
  • MLE for β0
  • MLE for β0
  • MLE for β0
  • MLE for β1 and σϵ2
  • MLE in matrix form
  • MLE for linear regression in matrix form
  • Putting it all together
  • References
  • f Fullscreen
  • s Speaker View
  • o Slide Overview
  • e PDF Export Mode
  • r Scroll View Mode
  • b Toggle Chalkboard
  • c Toggle Notes Canvas
  • d Download Drawings
  • ? Keyboard Help