Properties of estimators

Prof. Maria Tackett

Mar 20, 2025

Announcements

  • HW 03 due TODAY at 11:59pm

  • Project exploratory data analysis due TODAY at 11:59pm

    • Next project milestone: Presentations in March 28 lab
  • Statistics experience due April 22

Questions from this week’s content?

Topics

  • Properties of the least squares estimator

Note

This is not a mathematical statistics class. There are semester-long courses that will go into these topics in much more detail; we will barely scratch the surface in this course.

Our goals are to understand

  • Estimators have properties

  • A few properties of the least squares estimator and why they are useful

Properties of β^

Motivation

  • We have discussed how to use least squares and maximum likelihood estimation to find estimators for β

  • How do we know whether our least squares estimator (and MLE) is a “good” estimator?

  • When we consider what makes an estimator “good”, we’ll look at three criteria:

    • Bias
    • Variance
    • Mean squared error

Bias and variance

Suppose you are throwing darts at a target

Image source: Analytics Vidhya
  • Unbiased: Darts distributed around the target

  • Biased: Darts systematically away from the target

  • Variance: Darts could be widely spread (high variance) or generally clustered together (low variance)

Bias and variance

  • Ideal scenario: Darts are clustered around the target (unbiased and low variance)

  • Worst case scenario: Darts are widely spread out and systematically far from the target (high bias and high variance)

  • Acceptable scenario: There’s some trade-off between the bias and variance. For example, it may be acceptable for the darts to be clustered around a point that is close to the target (low bias and low variance)

Bias and variance

  • Each time we take a sample of size n, we can find the least squares estimator (throw dart at target)

  • Suppose we take many independent samples of size n and find the least squares estimator for each sample (throw many darts at the target). Ideally,

    • The estimators are centered at the true parameter (unbiased)

    • The estimators are clustered around the true parameter (unbiased with low variance)

Properties of β^

Finite sample ( n ) properties

  • Unbiased estimator

  • Best Linear Unbiased Estimator (BLUE)


Asymptotic ( n→∞ ) properties

  • Consistent estimator

  • Efficient estimator

  • Asymptotic normality

Finite sample properties

Unbiased estimator

The bias of an estimator is the difference between the estimator’s expected value and the true value of the parameter

Let θ^ be an estimator of the parameter θ. Then

Bias(θ^)=E(θ^)−θ

An estimator is unbiased if the bias is 0 and thus E(θ^)=θ

Expected value of β^

Let’s take a look at the expected value of least-squares estimator:

E(β^)=E[(XTX)−1XTy]=(XTX)−1XTE[y]=(XTX)−1XTXβ=β

Expected value of β^

The least squares estimator (and MLE) β^ is an unbiased estimator of β

E(β^)=β

Variance of β^

Var(β^)=Var((XTX)−1XTy)=[(XTX)−1XT]Var(y)[(XTX)−1XT]T=[(XTX)−1XT]σϵ2I[X(XTX)−1]=σϵ2[(XTX)−1XTX(XTX)−1]=σϵ2(XTX)−1

We will show that β^ is the “best” estimator (has the lowest variance) among the class of linear unbiased estimators




Gauss-Markov Theorem

The least-squares estimator of β in the model y=Xβ+ϵ is given by β^. Given the errors have mean 0 and variance σϵ2I , then β^ is BLUE (best linear unbiased estimator).

“Best” means β^ has the smallest variance among all linear unbiased estimators for β .

Gauss-Markov Theorem Proof

Suppose β^′ is another linear unbiased estimator of β that can be expressed as β^′=Cy , such that y^=Xβ^′=XCy


Let C=(XTX)−1XT+B for a non-zero matrix B.


What is the dimension of B?

Gauss-Markov Theorem Proof

β^′=Cy=((XTX)−1XT+B)y

We need to show

  • β^′ is unbiased

  • Var(β^′)>Var(β^)

Gauss-Markov Theorem Proof

E(β^′)=E[((XTX)−1XT+B)y]=E[((XTX)−1XT+B)(Xβ+ϵ)]=E[((XTX)−1XT+B)(Xβ)]=((XTX)−1XT+B)(Xβ)=(I+BX)β

  • What assumption(s) of the Gauss-Markov Theorem did we use?

  • What must be true for β^′ to be unbiased?

Gauss-Markov Theorem Proof

  • BX must be the 0 matrix (dimension = (p+1)×(p+1)) in order for β^′ to be unbiased

  • Now we need to find Var(β^′) and see how it compares to Var(β^)

Gauss-Markov Theorem Proof

Var(β^′)=Var[((XTX)−1XT+B)y]=((XTX)−1XT+B)Var(y)((XTX)−1XT+B)T=σϵ2[(XTX)−1XTX(XTX)−1+(XTX)−1XTBT+BX(XTX)−1+BBT]=σϵ2(XTX)−1+σϵ2BBT

What assumption(s) of the Gauss-Markov Theorem did we use?

Gauss-Markov Theorem Proof

We have

Var(β^′)=σϵ2(XTX)−1+σϵ2BBT

We know that σϵ2BBT≥0.


When is σϵ2BBT=0?

Therefore, we have shown that Var(β^′)>Var(β^) and have completed the proof.




Gauss-Markov Theorem

The least-squares estimator of β in the model y=Xβ+ϵ is given by β^. Given the errors have mean 0 and variance σϵ2I , then β^ is BLUE (best linear unbiased estimator).

“Best” means β^ has the smallest variance among all linear unbiased estimators for β .

Properties of β^

Finite sample ( n ) properties

  • Unbiased estimator ✅

  • Best Linear Unbiased Estimator (BLUE) ✅


Asymptotic ( n→∞ ) properties

  • Consistent estimator

  • Efficient estimator

  • Asymptotic normality

Asymptotic properties

Properties from the MLE

  • Recall that the least-squares estimator β^ is equal to the Maximum Likelihood Estimator β~

  • Maximum likelihood estimators have nice statistical properties and the β^ inherits all of these properties

    • Consistency
    • Efficiency
    • Asymptotic normality

Note

We will define the properties here, and you will explore them in much more depth in STA 332: Statistical Inference

Mean squared error

The mean squared error (MSE) is the squared difference between the estimator and parameter.

Let θ^ be an estimator of the parameter θ. Then

MSE(θ^)=E[(θ^−θ)2]=E(θ^2−2θ^θ+θ2)=E(θ^2)−2θE(θ^)+θ2=E(θ^2)−E(θ^)2⏟Var(θ^)+E(θ^)2−2θE(θ^)+θ2⏟Bias(θ)2

Mean squared error

MSE(θ^)=Var(θ^)+Bias(θ^)2


The least-squares estimator β^ is unbiased, so MSE(β^)=Var(β^)

Consistency

An estimator θ^ is a consistent estimator of a parameter θ if it converges in probability to θ. Given a sequence of estimators θ^1,θ^2,..., then for every ϵ>0,

limn→∞P(|θ^n−θ|≥ϵ)=0

This means that as the sample size goes to ∞ (and thus the sample information gets better and better), the estimator will be arbitrarily close to the parameter with high probability.

Why is this a useful property of an estimator?

Consistency



Important

Theorem

An estimator θ^ is a consistent estimator of the parameter θ if the sequence of estimators θ^1,θ^2,… satisfies

  • limn→∞Var(θ^)=0

  • limn→∞Bias(θ^)=0

Consistency of β^

Bias(β^)=0, so limn→∞Bias(β^)=0


Now we need to show that limn→∞Var(β^)=0

  • What is Var(β^)?

  • Show Var(β^)→0 as n→∞.

Therefore β^ is a consistent estimator.

Efficiency

  • An estimator if efficient if it has the smallest variance among a class of estimators as n→∞

  • By the Gauss-Markov Theorem, we have shown that the least-squares estimator β^ is the most efficient among linear unbiased estimators.

  • Maximum Likelihood Estimators are the most efficient among all unbiased estimators.

  • Therefore, β^ is the most efficient among all unbiased estimators of β

Note

Proof of this in a later statistics class.

Asymptotic normality

  • Maximum Likelihood Estimators are asymptotically normal, meaning the distribution of an MLE is normal as n→∞

  • Therefore, we know the distribution of β^ is normal when n is large, regardless of the underlying data

Note

Proof of this in a later statistics class.

Recap

Finite sample ( n ) properties

  • Unbiased estimator ✅

  • Best Linear Unbiased Estimator (BLUE) ✅


Asymptotic ( n→∞ ) properties

  • Consistent estimator ✅

  • Efficient estimator ✅

  • Asymptotic normality ✅

Questions from this week’s content?

🔗 STA 221 - Spring 2025

1 / 35
Properties of estimators Prof. Maria Tackett Mar 20, 2025

  1. Slides

  2. Tools

  3. Close
  • Properties of estimators
  • Announcements
  • Questions from this week’s content?
  • Topics
  • Properties of β^
  • Motivation
  • Bias and variance
  • Bias and variance
  • Bias and variance
  • Properties of β^
  • Finite sample properties
  • Unbiased estimator
  • Expected value of β^
  • Expected value of β^
  • Variance of β^
  • Gauss-Markov Theorem...
  • Gauss-Markov Theorem Proof
  • Gauss-Markov Theorem Proof
  • Gauss-Markov Theorem Proof
  • Gauss-Markov Theorem Proof
  • Gauss-Markov Theorem Proof
  • Gauss-Markov Theorem Proof
  • Gauss-Markov Theorem...
  • Properties of β^
  • Asymptotic properties
  • Properties from the MLE
  • Mean squared error
  • Mean squared error
  • Consistency
  • Consistency
  • Consistency of β^
  • Efficiency
  • Asymptotic normality
  • Recap
  • Questions from this week’s content?
  • f Fullscreen
  • s Speaker View
  • o Slide Overview
  • e PDF Export Mode
  • r Scroll View Mode
  • b Toggle Chalkboard
  • c Toggle Notes Canvas
  • d Download Drawings
  • ? Keyboard Help