STA 721: Lecture 7
Duke University
distributions of \(\hat{\boldsymbol{\beta}}\), \(\hat{\mathbf{Y}}\), \(\hat{\boldsymbol{\epsilon}}\) under normality
Unbiased Estimation of \(\sigma^2\)
sampling distribution of \(\hat{\sigma^2}\)
independence
Readings:
Under the linear model \(\mathbf{Y}= \mathbf{X}\boldsymbol{\beta}+ \boldsymbol{\epsilon}\), \(\textsf{E}[\boldsymbol{\epsilon}] = \mathbf{0}_n\) and \(\textsf{Cov}[\boldsymbol{\epsilon}] = \sigma^2 \mathbf{I}_n\), we had
For a \(d\) dimensional multivariate normal random vector, we write \(\mathbf{Y}\sim \textsf{N}_d(\boldsymbol{\mu}, \boldsymbol{\Sigma})\)
\(\textsf{E}[\mathbf{Y}] = \boldsymbol{\mu}\): \(d\) dimensional vector with means \(E[Y_j]\)
\(\textsf{Cov}[\mathbf{Y}] = \boldsymbol{\Sigma}\): \(d \times d\) matrix with diagonal elements that are the variances of \(Y_j\) and off diagonal elements that are the covariances \(\textsf{E}[(Y_j - \mu_j)(Y_k - \mu_k)]\)
If \(\boldsymbol{\Sigma}\) is positive definite (\(\mathbf{x}'\boldsymbol{\Sigma}\mathbf{x}> 0\) for any \(\mathbf{x}\ne 0\) in \(\mathbb{R}^d\)) then \(\mathbf{Y}\) has a density\(^\dagger\) \[p(\mathbf{Y}) = (2 \pi)^{-d/2} |\boldsymbol{\Sigma}|^{-1/2} \exp(-\frac{1}{2}(\mathbf{Y}- \boldsymbol{\mu})^T \boldsymbol{\Sigma}^{-1} (\mathbf{Y}- \boldsymbol{\mu}))\]
\(\dagger\) density with respect to Lebesgue measure on \(\mathbb{R}^d\)
If \(\mathbf{Y}\sim \textsf{N}_n(\boldsymbol{\mu}, \boldsymbol{\Sigma})\) then for \(\mathbf{A}\) \(m \times n\) \[\mathbf{A}\mathbf{Y}\sim \textsf{N}_m(\mathbf{A}\boldsymbol{\mu}, \mathbf{A}\boldsymbol{\Sigma}\mathbf{A}^T)\]
\(\hat{\boldsymbol{\beta}}= (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\mathbf{Y}\sim \textsf{N}(\boldsymbol{\beta}, \sigma^2 (\mathbf{X}^T\mathbf{X})^{-1})\)
\(\hat{\mathbf{Y}}= \mathbf{P}_\mathbf{X}\mathbf{Y}\sim \textsf{N}(\mathbf{X}\boldsymbol{\beta}, \sigma^2 \mathbf{P}_\mathbf{X})\)
\(\hat{\boldsymbol{\epsilon}}= (\mathbf{I}_n - \mathbf{P}_\mathbf{X})\mathbf{Y}\sim \textsf{N}(\mathbf{0}, \sigma^2 (\mathbf{I}_n - \mathbf{P}_\mathbf{X}))\)
\(\mathbf{A}\boldsymbol{\Sigma}\mathbf{A}^T\) does not have to be positive definite!
If the covariance is singular then there is no density (on \(\mathbb{R}^n\)), but claim that \(\mathbf{Y}\) still has a multivariate normal distribution!
Recall we found the MLE of \(\sigma^2\) \[{\hat{\sigma}}^2= \frac{\hat{\boldsymbol{\epsilon}}^T\hat{\boldsymbol{\epsilon}}} {n}\]
let \(\textsf{RSS}= \| \hat{\boldsymbol{\epsilon}}\|^2 = \hat{\boldsymbol{\epsilon}}^T\hat{\boldsymbol{\epsilon}}\)
then \[\begin{align*} \| \hat{\boldsymbol{\epsilon}}\|^2 & = \hat{\boldsymbol{\epsilon}}^T\hat{\boldsymbol{\epsilon}}\\ & = \boldsymbol{\epsilon}^T(\mathbf{I}_n - \mathbf{P}_\mathbf{X})^T (\mathbf{I}_n - \mathbf{P}_\mathbf{X}) \boldsymbol{\epsilon}\\ & = \boldsymbol{\epsilon}^T(\mathbf{I}_n - \mathbf{P}_\mathbf{X}) \boldsymbol{\epsilon}\\ & = \boldsymbol{\epsilon}^N \textsf{N}\textsf{N}^T \boldsymbol{\epsilon}\\ & = \mathbf{e}^T\mathbf{e} \end{align*}\]
\(\textsf{N}\) is the matrix of the \((n - p)\) eigen vectors from the spectral decomposition of \((\mathbf{I}_n - \mathbf{P}_\mathbf{X})\) associated with the non-zero eigen-values.
Since \(\boldsymbol{\epsilon}\sim \textsf{N}(\mathbf{0}_n, \sigma^2 \mathbf{I}_n)\) and \(\textsf{N}\in \mathbb{R}^{n \times (n - p)}\), \[\textsf{N}^T \boldsymbol{\epsilon}= \mathbf{e}\sim \textsf{N}(\mathbf{0}_{n - p}, \sigma^2\textsf{N}^T\textsf{N}) = \textsf{N}(\mathbf{0}_{n - p}, \sigma^2\mathbf{I}_{n - p} )\]
\[\begin{align*} \textsf{RSS}& = \sum_{i = 1}^{n-p} e_i^2 \\ & \mathrel{\mathop{=}\limits^{\rm D}}\sum_{i = 1}^{n-p} (\sigma z_i)^2 \quad \text{ where } \mathbf{Z}\sim \textsf{N}(\mathbf{0}_{n-p}, \mathbf{I}_{n-p}) \\ & = \sigma^2 \sum_{i = 1}^{n-p} z_i^2 \\ &\mathrel{\mathop{=}\limits^{\rm D}}\sigma^2 \chi^2_{n-p} \end{align*}\]
Background Theory: If \(\mathbf{Z}\sim \textsf{N}_d(\mathbf{0}_d, \mathbf{I}_d)\), then \(\mathbf{Z}^T\mathbf{Z}\sim \chi^2_{d}\)
Expected value of a \(\chi^2_d\) random variable is \(d\) (the degrees of freedom)
\(\textsf{E}[\textsf{RSS}] = \textsf{E}[\sigma^2 \chi^2_{n-p}] = \sigma^2 (n-p)\)
the expected value of the MLE is \[{\hat{\sigma}}^2= \textsf{E}[\textsf{RSS}]/n = \sigma^2 \frac{(n-p)}{n}\] so is biased
an unbiased estimator of \(\sigma^2\), is \(s^2 = \textsf{RSS}/(n-p)\)
note: we can find the expectation of \({\hat{\sigma}}^2\) or \(s^2\) based on the covariance of \(\boldsymbol{\epsilon}\) without assuming normality by exploiting properties of the trace.
\(\hat{\boldsymbol{\beta}}\sim \textsf{N}\left(\boldsymbol{\beta}, \sigma^2( \mathbf{X}^T\mathbf{X})^{-1}\right)\)
do not know \(\sigma^2\)
Need a distribution that does not depend on unknown parameters for deriving confidence intervals and hypothesis tests for \(\boldsymbol{\beta}\).
what if we plug in \(s^2\) or \({\hat{\sigma}}^2\) for \(\sigma^2\)?
won’t be multivariate normal
need to reflect uncertainty in estimating \(\sigma^2\)
first show that \(\hat{\boldsymbol{\beta}}\) and \(s^2\) are independent
If the distribution of \(\mathbf{Y}\) is normal, then \(\hat{\boldsymbol{\beta}}\) and \(s^2\) are statistically independent.
The derivation of this result basically has three steps:
Step 1:
\[\begin{align*} \textsf{Cov}[\hat{\boldsymbol{\beta}}, \hat{\boldsymbol{\epsilon}}] & = \textsf{E}[(\hat{\boldsymbol{\beta}}- \boldsymbol{\beta}) \hat{\boldsymbol{\epsilon}}^T] \\ & = \textsf{E}[(\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\boldsymbol{\epsilon}\boldsymbol{\epsilon}^T (\mathbf{I}- \mathbf{P}_\mathbf{X})] \\ & = \sigma^2 (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T (\mathbf{I}- \mathbf{P}_\mathbf{X}) \\ & = \mathbf{0} \end{align*}\]
Step 2: \(\hat{\boldsymbol{\beta}}\) and \(\hat{\boldsymbol{\epsilon}}\) are independent
Easy direction
\(\textsf{Cov}[\mathbf{W}_1, \mathbf{W}_2] = \textsf{E}[(\mathbf{W}_1 - \boldsymbol{\mu}_1)(\mathbf{W}_2 - \boldsymbol{\mu}_2)^T]\)
since they are independent \[\begin{align*} \textsf{Cov}[\mathbf{W}_1, \mathbf{W}_2] & = \textsf{E}[(\mathbf{W}_1 - \boldsymbol{\mu}_1)] \textsf{E}[(\mathbf{W}_2 - \boldsymbol{\mu}_2)^T] \\ & = \mathbf{0}\mathbf{0}^T \\ & = \mathbf{0} \end{align*}\]
so \(\mathbf{W}_1\) and \(\mathbf{W}_2\) are uncorrelated
Assume \(\boldsymbol{\Sigma}_{12} = \mathbf{0}\):
Choose an \[\mathbf{A}= \left[ \begin{array}{ll} \mathbf{A}_1 & \mathbf{0}\\ \mathbf{0}& \mathbf{A}_2 \end{array} \right]\] such that \(\mathbf{A}_1 \mathbf{A}_1^T = \boldsymbol{\Sigma}_{11}\), \(\mathbf{A}_2 \mathbf{A}_2^T = \boldsymbol{\Sigma}_{22}\)
Partition
\[ \mathbf{Z}= \left[
\begin{array}{c}
\mathbf{Z}_1 \\ \mathbf{Z}_2
\end{array}
\right] \sim \textsf{N}\left(
\left[
\begin{array}{c}
\mathbf{0}_1 \\ \mathbf{0}_2
\end{array}
\right],
\left[
\begin{array}{ll}
\mathbf{I}_1 &\mathbf{0}\\
\mathbf{0}& \mathbf{I}_2
\end{array}
\right]
\right) \text{ and } \boldsymbol{\mu}= \left[
\begin{array}{c}
\boldsymbol{\mu}_1 \\ \boldsymbol{\mu}_2
\end{array}
\right]\]
then \(\mathbf{W}\mathrel{\mathop{=}\limits^{\rm D}}\mathbf{A}\mathbf{Z}+ \boldsymbol{\mu}\sim \textsf{N}(\boldsymbol{\mu}, \boldsymbol{\Sigma})\)
\(\mathbf{W}\mathrel{\mathop{=}\limits^{\rm D}}\mathbf{A}\mathbf{Z}+ \boldsymbol{\mu}\sim \textsf{N}(\boldsymbol{\mu}, \boldsymbol{\Sigma})\)
\[\begin{align*} \left[ \begin{array}{c} \mathbf{W}_1 \\ \mathbf{W}_2 \end{array} \right] & \mathrel{\mathop{=}\limits^{\rm D}} \left[ \begin{array}{cc} \mathbf{A}_1 & \mathbf{0}\\ \mathbf{0}& \mathbf{A}_2 \end{array} \right] \left[ \begin{array}{c} \mathbf{Z}_1 \\ \mathbf{Z}_2 \end{array} \right] + \left[ \begin{array}{c} \boldsymbol{\mu}_1 \\ \boldsymbol{\mu}_2 \end{array} \right] \\ & = \left[ \begin{array}{c} \mathbf{A}_1\mathbf{Z}_1 + \boldsymbol{\mu}_1 \\ \mathbf{A}_2\mathbf{Z}_2 +\boldsymbol{\mu}_2 \end{array} \right] \end{align*}\]
For Multivariate Normal Zero Covariance implies independence!
\[ \left[ \begin{array}{c} \mathbf{W}_1 \\ \mathbf{W}_2 \end{array} \right] = \left[ \begin{array}{c} \mathbf{A}\\ \mathbf{B} \end{array} \right] \mathbf{Y}= \left[ \begin{array}{c} \mathbf{A}\mathbf{Y}\\ \mathbf{B}\mathbf{Y} \end{array} \right] \]
\(\textsf{Cov}(\mathbf{W}_1, \mathbf{W}_2) = \textsf{Cov}(\mathbf{A}\mathbf{Y}, \mathbf{B}\mathbf{Y}) = \sigma^2 \mathbf{A}\mathbf{B}^T\)
\(\mathbf{A}\mathbf{Y}\) and \(\mathbf{B}\mathbf{Y}\) are independent if \(\mathbf{A}\mathbf{B}^T = \mathbf{0}\)
Show \(\hat{\boldsymbol{\beta}}\) and \(\textsf{RSS}\) are independent
\(\hat{\boldsymbol{\beta}}= (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{Y}\) and \(\hat{\boldsymbol{\epsilon}}= (\mathbf{I}- \mathbf{P}_\mathbf{X})\mathbf{Y}\) are independent
functions of independent random variables are independent so \(\hat{\boldsymbol{\beta}}\) and \(\textsf{RSS}= \hat{\boldsymbol{\epsilon}}^T\hat{\boldsymbol{\epsilon}}\) are independent
so \(\hat{\boldsymbol{\beta}}\) and \(s^2 = \textsf{RSS}/(n-p)\) are independent
This result will be critical for creating confidence regions and intervals for \(\boldsymbol{\beta}\) and linear combinations of \(\boldsymbol{\beta}\), \(\lambda^T \boldsymbol{\beta}\) as well as testing hypotheses
shrinkage estimators
Bayes and penalized loss functions