Properties of estimators

Author

Prof. Maria Tackett

Published

Mar 20, 2025

Announcements

HW 03 due TODAY at 11:59pm
Project exploratory data analysis due TODAY at 11:59pm
- Next project milestone: Presentations in March 28 lab
Statistics experience due April 22

Questions from this week’s content?

Topics

Properties of the least squares estimator

Note

This is not a mathematical statistics class. There are semester-long courses that will go into these topics in much more detail; we will barely scratch the surface in this course.

Our goals are to understand

Estimators have properties
A few properties of the least squares estimator and why they are useful

Properties of $\hat{β}$

Motivation

We have discussed how to use least squares and maximum likelihood estimation to find estimators for $β$
How do we know whether our least squares estimator (and MLE) is a “good” estimator?
When we consider what makes an estimator “good”, we’ll look at three criteria:
- Bias
- Variance
- Mean squared error

Bias and variance

Suppose you are throwing darts at a target

. . .

Unbiased: Darts distributed around the target
Biased: Darts systematically away from the target
Variance: Darts could be widely spread (high variance) or generally clustered together (low variance)

Bias and variance

Ideal scenario: Darts are clustered around the target (unbiased and low variance)
Worst case scenario: Darts are widely spread out and systematically far from the target (high bias and high variance)
Acceptable scenario: There’s some trade-off between the bias and variance. For example, it may be acceptable for the darts to be clustered around a point that is close to the target (low bias and low variance)

Bias and variance

Each time we take a sample of size $n$ , we can find the least squares estimator (throw dart at target)
Suppose we take many independent samples of size $n$ and find the least squares estimator for each sample (throw many darts at the target). Ideally,
- The estimators are centered at the true parameter (unbiased)
- The estimators are clustered around the true parameter (unbiased with low variance)

Properties of $\hat{β}$

Finite sample ( $n$ ) properties

Unbiased estimator
Best Linear Unbiased Estimator (BLUE)

Asymptotic ( $n \to \infty$ ) properties

Consistent estimator
Efficient estimator
Asymptotic normality

Finite sample properties

Unbiased estimator

The bias of an estimator is the difference between the estimator’s expected value and the true value of the parameter

Let $\hat{θ}$ be an estimator of the parameter $θ$ . Then

$B i a s (\hat{θ}) = E (\hat{θ}) - θ$

An estimator is unbiased if the bias is 0 and thus $E (\hat{θ}) = θ$

Expected value of $\hat{β}$

Let’s take a look at the expected value of least-squares estimator:

$\begin{aligned} E (\hat{β}) & = E [(X^{T} X)^{- 1} X^{T} y] \\ = (X^{T} X)^{- 1} X^{T} E [y] \\ = (X^{T} X)^{- 1} X^{T} X β \\ = β \end{aligned}$

Expected value of $\hat{β}$

The least squares estimator (and MLE) $\hat{β}$ is an unbiased estimator of $β$

$E (\hat{β}) = β$

Variance of $\hat{β}$

$\begin{aligned} V a r (\hat{β}) & = V a r ((X^{T} X)^{- 1} X^{T} y) \\ = [(X^{T} X)^{- 1} X^{T}] V a r (y) [(X^{T} X)^{- 1} X^{T}]^{T} \\ = [(X^{T} X)^{- 1} X^{T}] σ_{ϵ}^{2} I [X (X^{T} X)^{- 1}] \\ = σ_{ϵ}^{2} [(X^{T} X)^{- 1} X^{T} X (X^{T} X)^{- 1}] \\ = σ_{ϵ}^{2} (X^{T} X)^{- 1} \end{aligned}$

. . .

We will show that $\hat{β}$ is the “best” estimator (has the lowest variance) among the class of linear unbiased estimators

Gauss-Markov Theorem

The least-squares estimator of $β$ in the model $y = X β + ϵ$ is given by $\hat{β}$ . Given the errors have mean $0$ and variance $σ_{ϵ}^{2} I$ , then $\hat{β}$ is BLUE (best linear unbiased estimator).

“Best” means $\hat{β}$ has the smallest variance among all linear unbiased estimators for $β$ .

Gauss-Markov Theorem Proof

Suppose ${\hat{β}}^{'}$ is another linear unbiased estimator of $β$ that can be expressed as ${\hat{β}}^{'} = Cy$ , such that $\hat{y} = X {\hat{β}}^{'} = XCy$

Let $C = (X^{T} X)^{- 1} X^{T} + B$ for a non-zero matrix $B$ .

What is the dimension of $B$ ?

Gauss-Markov Theorem Proof

${\hat{β}}^{'} = Cy = ((X^{T} X)^{- 1} X^{T} + B) y$

We need to show

${\hat{β}}^{'}$ is unbiased
$V a r ({\hat{β}}^{'}) > V a r (\hat{β})$

Gauss-Markov Theorem Proof

$\begin{aligned} E ({\hat{β}}^{'}) & = E [((X^{T} X)^{- 1} X^{T} + B) y] \\ = E [((X^{T} X)^{- 1} X^{T} + B) (X β + ϵ)] \\ = E [((X^{T} X)^{- 1} X^{T} + B) (X β)] \\ = ((X^{T} X)^{- 1} X^{T} + B) (X β) \\ = (I + BX) β \end{aligned}$

What assumption(s) of the Gauss-Markov Theorem did we use?
What must be true for ${\hat{β}}^{'}$ to be unbiased?

Gauss-Markov Theorem Proof

$BX$ must be the $0$ matrix (dimension = $(p + 1) \times (p + 1)$ ) in order for ${\hat{β}}^{'}$ to be unbiased
Now we need to find $V a r ({\hat{β}}^{'})$ and see how it compares to $V a r (\hat{β})$

Gauss-Markov Theorem Proof

$\begin{aligned} V a r ({\hat{β}}^{'}) & = V a r [((X^{T} X)^{- 1} X^{T} + B) y] \\ = ((X^{T} X)^{- 1} X^{T} + B) V a r (y) ((X^{T} X)^{- 1} X^{T} + B)^{T} \\ = σ_{ϵ}^{2} [(X^{T} X)^{- 1} X^{T} X (X^{T} X)^{- 1} + (X^{T} X)^{- 1} X^{T} B^{T} + BX (X^{T} X)^{- 1} + {BB}^{T}] \\ = σ_{ϵ}^{2} (X^{T} X)^{- 1} + σ_{ϵ}^{2} {BB}^{T} \end{aligned}$

What assumption(s) of the Gauss-Markov Theorem did we use?

Gauss-Markov Theorem Proof

We have

$V a r ({\hat{β}}^{'}) = σ_{ϵ}^{2} (X^{T} X)^{- 1} + σ_{ϵ}^{2} {BB}^{T}$

. . .

We know that $σ_{ϵ}^{2} {BB}^{T} \geq 0$ .

. . .

When is $σ_{ϵ}^{2} {BB}^{T} = 0$ ?

. . .

Therefore, we have shown that $V a r ({\hat{β}}^{'}) > V a r (\hat{β})$ and have completed the proof.

Gauss-Markov Theorem

“Best” means $\hat{β}$ has the smallest variance among all linear unbiased estimators for $β$ .

Properties of $\hat{β}$

Finite sample ( $n$ ) properties

Unbiased estimator ✅
Best Linear Unbiased Estimator (BLUE) ✅

Asymptotic ( $n \to \infty$ ) properties

Consistent estimator
Efficient estimator
Asymptotic normality

Asymptotic properties

Properties from the MLE

Recall that the least-squares estimator $\hat{β}$ is equal to the Maximum Likelihood Estimator $\tilde{β}$
Maximum likelihood estimators have nice statistical properties and the $\hat{β}$ inherits all of these properties
- Consistency
- Efficiency
- Asymptotic normality

Note

We will define the properties here, and you will explore them in much more depth in STA 332: Statistical Inference

Mean squared error

The mean squared error (MSE) is the squared difference between the estimator and parameter.

. . .

Let $\hat{θ}$ be an estimator of the parameter $θ$ . Then

$\begin{aligned} M S E (\hat{θ}) & = E [(\hat{θ} - θ)^{2}] \\ = E ({\hat{θ}}^{2} - 2 \hat{θ} θ + θ^{2}) \\ = E ({\hat{θ}}^{2}) - 2 θ E (\hat{θ}) + θ^{2} \\ = \underset{V a r (\hat{θ})}{\underset{⏟}{E ({\hat{θ}}^{2}) - E (\hat{θ})^{2}}} + \underset{B i a s (θ)^{2}}{\underset{⏟}{E (\hat{θ})^{2} - 2 θ E (\hat{θ}) + θ^{2}}} \end{aligned}$

. . .

Mean squared error

$M S E (\hat{θ}) = V a r (\hat{θ}) + B i a s (\hat{θ})^{2}$

. . .

The least-squares estimator $\hat{β}$ is unbiased, so $M S E (\hat{β}) = V a r (\hat{β})$

Consistency

An estimator $\hat{θ}$ is a consistent estimator of a parameter $θ$ if it converges in probability to $θ$ . Given a sequence of estimators ${\hat{θ}}_{1}, {\hat{θ}}_{2}, . . .$ , then for every $ϵ > 0$ ,

$lim_{n \to \infty} P (| {\hat{θ}}_{n} - θ | \geq ϵ) = 0$

. . .

This means that as the sample size goes to $\infty$ (and thus the sample information gets better and better), the estimator will be arbitrarily close to the parameter with high probability.

Why is this a useful property of an estimator?

Consistency

Important

Theorem

An estimator $\hat{θ}$ is a consistent estimator of the parameter $θ$ if the sequence of estimators ${\hat{θ}}_{1}, {\hat{θ}}_{2}, \dots$ satisfies

$lim_{n \to \infty} V a r (\hat{θ}) = 0$
$lim_{n \to \infty} B i a s (\hat{θ}) = 0$

Consistency of $\hat{β}$

$B i a s (\hat{β}) = 0$ , so $lim_{n \to \infty} B i a s (\hat{β}) = 0$

. . .

Now we need to show that $lim_{n \to \infty} V a r (\hat{β}) = 0$

What is $V a r (\hat{β})$ ?
Show $V a r (\hat{β}) \to 0$ as $n \to \infty$ .

. . .

Therefore $\hat{β}$ is a consistent estimator.

Efficiency

An estimator if efficient if it has the smallest variance among a class of estimators as $n \to \infty$
By the Gauss-Markov Theorem, we have shown that the least-squares estimator $\hat{β}$ is the most efficient among linear unbiased estimators.
Maximum Likelihood Estimators are the most efficient among all unbiased estimators.
Therefore, $\hat{β}$ is the most efficient among all unbiased estimators of $β$

Note

Proof of this in a later statistics class.

Asymptotic normality

Maximum Likelihood Estimators are asymptotically normal, meaning the distribution of an MLE is normal as $n \to \infty$
Therefore, we know the distribution of $\hat{β}$ is normal when $n$ is large, regardless of the underlying data

Note

Proof of this in a later statistics class.

Recap

Finite sample ( $n$ ) properties

Unbiased estimator ✅
Best Linear Unbiased Estimator (BLUE) ✅

Asymptotic ( $n \to \infty$ ) properties

Consistent estimator ✅
Efficient estimator ✅
Asymptotic normality ✅

Announcements

Questions from this week’s content?

Topics

Properties of β^

Motivation

Bias and variance

Bias and variance

Bias and variance

Properties of β^

Finite sample properties

Unbiased estimator

Expected value of β^

Expected value of β^

Variance of β^

Gauss-Markov Theorem Proof

Gauss-Markov Theorem Proof

Gauss-Markov Theorem Proof

Gauss-Markov Theorem Proof

Gauss-Markov Theorem Proof

Gauss-Markov Theorem Proof

Properties of β^

Asymptotic properties

Properties from the MLE

Mean squared error

Mean squared error

Consistency

Consistency

Consistency of β^

Efficiency

Asymptotic normality

Recap

Questions from this week’s content?

Properties of $\hat{β}$

Properties of $\hat{β}$

Expected value of $\hat{β}$

Expected value of $\hat{β}$

Variance of $\hat{β}$

Properties of $\hat{β}$

Consistency of $\hat{β}$