Math rules

This page contains mathematical rules we’ll use in this course that may be beyond what is covered in a linear algebra course.

Matrix calculus

Definition of gradient

Let $x = [\begin{matrix} x_{1} \\ x_{2} \\ ⋮ \\ x_{k} \end{matrix}]$ be a $k \times 1$ vector and $f (x)$ be a function of $x$ .

Then $\nabla_{x} f$ , the gradient of $f$ with respect to $x$ is

$\nabla_{x} f = [\begin{matrix} \frac{\partial f}{\partial x_{1}} \\ \frac{\partial f}{\partial x_{2}} \\ ⋮ \\ \frac{\partial f}{\partial x_{k}} \end{matrix}]$

Gradient of $x^{T} z$

Let $x$ be a $k \times 1$ vector and $z$ be a $k \times 1$ vector, such that $z$ is not a function of $x$ .

The gradient of $x^{T} z$ with respect to $x$ is

$\nabla_{x} x^{T} z = z$

Gradient of $x^{T} A x$

Let $x$ be a $k \times 1$ vector and $A$ be a $k \times k$ matrix, such that $A$ is not a function of $x$ .

Then the gradient of $x^{T} A x$ with respect to $x$ is

$\nabla_{x} x^{T} A x = (A x + A^{T} x) = (A + A^{T}) x$

If $A$ is symmetric, then

$(A + A^{T}) x = 2 A x$

Hessian matrix

The Hessian matrix, $\nabla_{x}^{2} f$ is a $k \times k$ matrix of partial second derivatives

$\nabla_{x}^{2} f = [\begin{matrix} \frac{\partial^{2} f}{\partial x_{1}^{2}} & \frac{\partial^{2} f}{\partial x_{1} \partial x_{2}} & \dots & \frac{\partial^{2} f}{\partial x_{1} \partial x_{k}} \\ \frac{\partial^{2} f}{\partial x_{2} \partial x_{1}} & \frac{\partial^{2} f}{\partial x_{2}^{2}} & \dots & \frac{\partial^{2} f}{\partial x_{2} \partial x_{k}} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ \frac{\partial^{2} f}{\partial x_{k} \partial x_{1}} & \frac{\partial^{2} f}{\partial x_{k} \partial x_{2}} & \dots & \frac{\partial^{2} f}{\partial x_{k}^{2}} \end{matrix}]$

Expected value

Expected value of random variable $X$

The expected value of a random variable $X$ is a weighted average, i.e., the mean value of the possible values a random variable can take weighted by the probability of the outcomes.

Let $f_{X} (x)$ be the probability distribution of $X$ . If $X$ is continuous then

$E (X) = \int_{- \infty}^{\infty} x f_{X} (x) d x$

If $X$ is discrete then

$E (X) = \sum_{x \in X} x f_{X} (x) = \sum_{x \in X} x P (X = x)$

Expected value of vector $z$

Let $z = [\begin{matrix} z_{1} \\ ⋮ \\ z_{p} \end{matrix}]$ be a $p \times 1$ vector of random variables.

Then $E (z) = E [\begin{matrix} z_{1} \\ ⋮ \\ z_{p} \end{matrix}] = [\begin{matrix} E (z_{1}) \\ ⋮ \\ E (z_{p}) \end{matrix}]$

Expected value of vector $Az$

Let $A$ be an $n \times p$ matrix of constants and $z$ a $p \times 1$ vector of random variables. Then

$E (Az) = A E (z)$

Expected value of $Az + C$

Let $A$ be an $n \times p$ matrix of constants, $C$ a $n \times 1$ vector of constants, and $z$ a $p \times 1$ vector of random variables. Then

$E (Az + C) = E (Az) + E (C) = A E (z) + C$

Expected value of $AXA^{T}$

Let $A$ be an $n \times p$ matrix of constants and $X$ a $p \times p$ matrix. Then

$E ({AXA}^{T}) = A E (X) A^{T}$

Variance

Variance of random variable $X$

The variance of a random variable $X$ is a measure of the spread of a distribution about its mean.

$V a r (X) = E [(X - E (X))^{2}] = E (X^{2}) - E (X)^{2}$

Variance of vector $z$

Let $z = [\begin{matrix} z_{1} \\ ⋮ \\ z_{p} \end{matrix}]$ be a $p \times 1$ vector of random variables. Then

$V a r (z) = E [(z - E (z)) (z - E (z))^{T}]$

This produced the variance-covariance matrix

$V a r (z) = [\begin{matrix} V a r (z_{1}) & C o v (z_{1}, z_{2}) & \dots & C o v (z_{1}, z_{p}) \\ C o v (z_{2}, z_{1}) & V a r (z_{2}) & \dots & C o v (z_{2}, z_{p}) \\ ⋮ & ⋮ & \dots & \cdot \\ C o v (z_{p}, z_{1}) & C o v (z_{p}, z_{2}) & \dots & V a r (z_{p}) \end{matrix}]$

Variance of $Az$

Let $A$ be an $n \times p$ matrix of constants and $z$ a $p \times 1$ vector of random variables. Then

$\begin{aligned} V a r (Az) & = E [(Az - E (Az)) (Az - E (Az))^{T}] \\ = A V a r (z) A^{T} \end{aligned}$

Probability distributions

Multivariate normal distribution

Let $z$ be a $p \times 1$ vector of random variables, such that $z$ follows a multivariate normal distribution with mean $μ$ and variance $Σ$ . Then the probability density function of $z$ is

$f (z) = \frac{1}{(2 π)^{p / 2} | Σ |^{1 / 2}} \exp {- \frac{1}{2} (z - μ)^{T} Σ^{- 1} (z - μ)}$

Linear transformation of normal random variable

Suppose $z$ is a multivariate normal random variable with mean $μ$ and variance $Σ$ . A linear transformation of $z$ is also multivariate normal, such that

$A z + B \sim N (A μ + B, A Σ A^{T})$

Matrix calculus

Definition of gradient

Gradient of xTz

Gradient of xTAx

Hessian matrix

Expected value

Expected value of random variable X

Expected value of vector z

Expected value of vector Az

Expected value of Az+C

Expected value of AXAT