Maximum likelihood estimation
Announcements
HW 03 due March 20 at 11:59pm
Project exploratory data analysis due March 20 at 11:59pm
- Next milestone: Project presentations in lab March 28
Statistics experience due April 22
Topics
Likelihood
Maximum likelihood estimation (MLE)
MLE for linear regression
Motivation
- We’ve discussed how to find the estimators of
and for the model
Today we will introduce another way to find these estimators - maximum likelihood estimation.
We will see the least-squares estimator is equal to the maximum likelihood estimator when certain assumptions hold
Maximum likelihood estimation
Example: Basketball shots
Suppose the a basketball player shoots the ball, such that the probability of making the basket (successfully making the shot) is
. . .
What is the probability distribution for this random phenomenon?
Suppose the probability is
. What is the probability the player makes a single basket, given this value of ?Suppose the probability is
. What is the probability the player makes a single basket, given this value of ?
Shooting the ball three times
Suppose the player shoots the ball three times. They are all independent and the player has the same probability
Let
. . .
Suppose the probability is
. What is the probability of observing the data , given this value of ?Suppose the probability is
. What is the probability of observing the data , given this value of ?
Shooting the ball three times
Suppose the player shoots the ball three times. They are all independent and the player has the same probability
The player shoots the ball three times with the outcome
. . .
New question: What parameter value of
do you think maximizes the probability of observing this data?We will use a likelihood function to answer this question.
Likelihood
A likelihood function is a measure of how likely we are to observe our data under each possible value of the parameter(s)
Note that this is not the same as the probability function.
Probability function: Fixed parameter value(s) + input possible outcomes
- Given
, what is the probability of observing in three basketball shots?
- Given
Likelihood function: Fixed data + input possible parameter values
- Given we’ve observed
, what is the most plausible value of ?
- Given we’ve observed
Likelihood: Three basketball shots
The likelihood function for the probability of a basket
. . .
Thus, if the likelihood for
Likelihood: Three basketball shots
What is the general formula for the likelihood function for
given the observed data ?How does assuming independence simplify things?
How does having identically distributed data simplify things?
Likelihood: Three basketball shots
The likelihood function for
. . .
We want of the value of
that maximizes this likelihood function, i.e., the value of that is most likely given the observed data.The process of finding this value is maximum likelihood estimation.
There are three primary ways to find the maximum likelihood estimator
Approximate using a graph
Using calculus
Numerical approximation
Finding the MLE using graphs
What do you think is the approximate value of the MLE of
Finding the MLE using calculus
- Find the MLE using the first derivative of the likelihood function.
- This can be tricky because of the product rule, so we can maximize the log(Likelihood) instead. The same value maximizes the likelihood and log(Likelihood).
Use calculus to find the MLE of
Shooting the ball times
Suppose the player shoots the ball
Suppose the player makes
. . .
- What is the formula for the probability distribution to describe this random phenomenon?
- What is the formula for the likelihood function for
given the observed data? - For what value of
do we maximize the likelihood given the observed data? Use calculus to find the response.
MLE in linear regression
Why maximum likelihood estimation?
“Maximum likelihood estimation is, by far, the most popular technique for deriving estimators.” (Casella and Berger 2024, 315)
MLEs have nice statistical properties (more on this next class)
Consistent
Efficient
Asymptotically normal
. . .
If the normality assumption holds, the least squares estimator is the maximum likelihood estimator for
Linear regression
Recall the linear model
. . .
- We have discussed least-squares estimation to find
and - We have used the fact that
when doing hypothesis testing and confidence intervals. - Now we will discuss how we know
is normally distributed, as we introduce MLE for linear regression
Simple linear regression model
Suppose we have the simple linear regression (SLR) model
such that
. . .
We can write this model in the form below and use this to find the MLE
Side note: Normal distribution
Let
SLR: Likelihood for
The likelihood function for
Log-Likelihood for
The log-likelihood function for
MLE for
1️⃣ Take derivative of
MLE for
2️⃣ Find the
. . .
After a few steps…
MLE for
3️⃣ We can use the second derivative to show we’ve found the maximum
. . .
Therefore, we have found the maximum. Thus, MLE for
. . .
Note that
MLE for and
We can use a similar process to find the MLEs for
. . .
. . .
Note:
MLE in matrix form
MLE for linear regression in matrix form
. . .
. . .
- For a fixed value of
, we know that is maximized when what is true about ? - What does this tell us about the relationship between the MLE and least-squares estimator for
?
Putting it all together
The MLE
is equivalent to the least-squares estimator , when the errors follow independent and identical normal distributionsMLEs have nice properties, so this means the least-squares estimator
inherits all the nice properties of MLEsThe MLE
is approximately equal to the least-squares estimator . When , the difference is trivial