STA 721: Lecture 4
Duke University
Readings: - Christensen Chapter 1-2 and Appendix B - Seber & Lee Chapter 3
Model: \(\mathbf{Y}= \boldsymbol{\mu}+ \boldsymbol{\epsilon}\)
Minimal Assumptions:
An estimator \(\tilde{\boldsymbol{\beta}}\) is a Linear Unbiased Estimator (LUE) of \(\boldsymbol{\beta}\) if
The class of linear unbiased estimators is the same for every model with parameter space \(\boldsymbol{\beta}\in \mathbb{R}^p\) and \(P \in \cal{P}\), for any collection \(\cal{P}\) of mean-zero distributions over \(\mathbb{R}^n\).
Consider another linear estimator \(\tilde{\boldsymbol{\beta}}= \mathbf{A}\mathbf{Y}\)
Difference between \(\tilde{\boldsymbol{\beta}}\) and \(\hat{\boldsymbol{\beta}}\) (OLS/MLE): \[\begin{align*} \mathbf{\delta}= \tilde{\boldsymbol{\beta}}- \hat{\boldsymbol{\beta}}& = \left(\mathbf{A}- (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T \right)\mathbf{Y}\\ & \equiv \mathbf{H}^T \mathbf{Y} \end{align*}\]
Since both \(\tilde{\boldsymbol{\beta}}\) and \(\hat{\boldsymbol{\beta}}\) are unbiased, \(\textsf{E}[\mathbf{\delta}] = \mathbf{0}_p \quad \forall \boldsymbol{\beta}\in \mathbb{R}^p\) \[\mathbf{0}_p = \textsf{E}[\mathbf{H}^T \mathbf{Y}] = \mathbf{H}^T \mathbf{X}\boldsymbol{\beta}\quad \forall \boldsymbol{\beta}\in \mathbb{R}^p\]
\(\mathbf{X}^T \mathbf{H}= \mathbf{0}\) so each column of \(\mathbf{H}\) is in \(\boldsymbol{{\cal M}}^\perp \equiv \boldsymbol{{\cal N}}\)
Since each column of \(\mathbf{H}\) is in \(\boldsymbol{{\cal N}}\) there exists a \(\mathbf{G}\in \mathbb{R}^{p \times (n-p)} \ni \mathbf{H}= \textsf{N}\mathbf{G}^T\)
Rewriting \(\mathbf{\delta}= \tilde{\boldsymbol{\beta}}- \hat{\boldsymbol{\beta}}\): \[\begin{align*} \tilde{\boldsymbol{\beta}}& = \hat{\boldsymbol{\beta}}+ \mathbf{\delta}\\ & = \hat{\boldsymbol{\beta}}+ \mathbf{H}^T\mathbf{Y}\\ & = \hat{\boldsymbol{\beta}}+ \mathbf{G}\textsf{N}^T\mathbf{Y} \end{align*}\]
Summary of previous results:
infinite number of LUEs!
Let \(\tilde{\boldsymbol{\beta}}= \mathbf{A}\mathbf{Y}\) be a LUE in the statistical linear model \(\mathbf{Y}= \mathbf{X}\boldsymbol{\beta}+ \boldsymbol{\epsilon}\) with \(\mathbf{X}\) full column rank \(p\) \[\begin{align*} \textsf{E}[\tilde{\boldsymbol{\beta}}] & = \textsf{E}[\mathbf{A}\mathbf{Y}] \\ & = \mathbf{A}\textsf{E}[\mathbf{Y}] \\ & = \mathbf{A}\mathbf{X}\boldsymbol{\beta}\quad \forall \boldsymbol{\beta}\in \mathbb{R}^p \end{align*}\]
the distribution of values of any unbiased estimator is centered around \(\boldsymbol{\beta}\)
out of the infinite number of LUEs is there one that is more concentrated around \(\boldsymbol{\beta}\)?
is there an unbiased estimator that has a lower variance than all other unbiased estimators?
Recall variance-covariance matrix of a random vector \(\mathbf{Z}\) with mean \(\boldsymbol{\theta}\) \[\begin{align*} \textsf{Cov}[\mathbf{Z}] & \equiv \textsf{E}[(\mathbf{Z}- \boldsymbol{\theta})(\mathbf{Z}- \boldsymbol{\theta})^T] \\ \textsf{Cov}[\mathbf{Z}]_{ij} & = \textsf{E}[(z_i - \theta_i)(z_j - \theta_j)] \end{align*}\]
Lemma
Let \(\mathbf{A}\in \mathbb{R}^{q \times p}\) and \(\mathbf{b}\in \mathbb{R}^q\) with \(\mathbf{Z}\) a random vector in \(\mathbb{R}^p\) then \[\textsf{Cov}[\mathbf{A}\mathbf{Z}+ \mathbf{b}] = \mathbf{A}\textsf{Cov}[\mathbf{Z}] \mathbf{A}^T \ge 0\]
Let’s look at the variance of any LUE under assumption \(\textsf{Cov}[\boldsymbol{\epsilon}] = \sigma^2 \mathbf{I}_n\)
for \(\hat{\boldsymbol{\beta}}= (\mathbf{X}^T\mathbf{X})^{-1} \mathbf{X}^T\mathbf{Y}= \boldsymbol{\beta}+ (\mathbf{X}^T\mathbf{X})^{-1} \mathbf{X}^T\boldsymbol{\epsilon}\) \[\begin{align*} \textsf{Cov}[\hat{\boldsymbol{\beta}}] & = \textsf{Cov}[\boldsymbol{\beta}+ (\mathbf{X}^T\mathbf{X})^{-1} \mathbf{X}^T\boldsymbol{\epsilon}] \\ & = (\mathbf{X}^T\mathbf{X})^{-1} \mathbf{X}^T\textsf{Cov}[\boldsymbol{\epsilon}] \mathbf{X}(\mathbf{X}^T\mathbf{X})^{-1} \\ & = \sigma^2 (\mathbf{X}^T\mathbf{X})^{-1} \mathbf{X}^T\mathbf{X}(\mathbf{X}^T\mathbf{X})^{-1} \\ & = \sigma^2 (\mathbf{X}^T\mathbf{X})^{-1} \end{align*}\]
Covariance is increasing in \(\sigma^2\) and generally decreasing in \(n\)
Rewrite \(\mathbf{X}^T\mathbf{X}\) as \(\mathbf{X}^T\mathbf{X}= \sum_{i=1}^n \mathbf{x}_i \mathbf{x}_i^T\) (a sum of \(n\) outer-products)
for \(\tilde{\boldsymbol{\beta}}= \left((\mathbf{X}^T\mathbf{X})^{-1} \mathbf{X}^T + \mathbf{H}^T \right)\mathbf{Y}= \boldsymbol{\beta}+ \left((\mathbf{X}^T\mathbf{X})^{-1} \mathbf{X}^T + \mathbf{H}^T \right)\boldsymbol{\epsilon}\)
recall \(\mathbf{X}_{MP}^- \equiv (\mathbf{X}^T\mathbf{X})^{-1} \mathbf{X}^T\) \[\begin{align*} \textsf{Cov}[\tilde{\boldsymbol{\beta}}] & = \textsf{Cov}[\left(\mathbf{X}_{MP}^- + \mathbf{H}^T \right)\boldsymbol{\epsilon}] \\ & = \sigma^2 \left(\mathbf{X}_{MP}^- + \mathbf{H}^T \right)\left(\mathbf{X}_{MP}^- + \mathbf{H}^T \right)^T \\ & = \sigma^2\left( \mathbf{X}_{MP}^-(\mathbf{X}_{MP}^-)^T + \mathbf{X}_{MP}^-\mathbf{H}+ \mathbf{H}^T (\mathbf{X}_{MP}^-)^T + \mathbf{H}^T \mathbf{H}\right) \\ & = \sigma^2\left( (\mathbf{X}^T\mathbf{X})^{-1} + \mathbf{H}^T \mathbf{H}\right) \end{align*}\]
Cross-product term \(\mathbf{H}^T(\mathbf{X}_{MP}^-)^T = \mathbf{H}^T\mathbf{X}(\mathbf{X}^T\mathbf{X})^{-1} = \mathbf{0}\)
Therefor the \(\textsf{Cov}[\tilde{\boldsymbol{\beta}}] = \textsf{Cov}[\hat{\boldsymbol{\beta}}] + \mathbf{H}^T\mathbf{H}\)
the sum of a positive definite matrix plus a positive semi-definite matrix
Is \(\textsf{Cov}[\tilde{\boldsymbol{\beta}}] \ge \textsf{Cov}[\hat{\boldsymbol{\beta}}]\) in some sense?
write \(\mathbf{A}= \mathbf{P}+ \mathbf{H}^T\) so \(\mathbf{H}^T = \mathbf{A}- \mathbf{P}\)
since \(\mathbf{A}\boldsymbol{\mu}= \boldsymbol{\mu}\), \(\mathbf{H}^T\mu = \mathbf{0}_n\) for \(\mu \in \boldsymbol{{\cal M}}\) and \(\mathbf{H}^T \mathbf{P}= \mathbf{P}\mathbf{H}= \mathbf{0}\) (columns of \(\mathbf{H}\in \boldsymbol{{\cal M}}^\perp\)) \[\begin{align*} \textsf{E}[\|\mathbf{A}\mathbf{Y}- \boldsymbol{\mu}\|^2] & = \textsf{E}[\|\mathbf{P}(\mathbf{Y}- \boldsymbol{\mu}) + \mathbf{H}^T(\mathbf{Y}- \boldsymbol{\mu})\|^2] \\ & = \textsf{E}[\|\mathbf{P}(\mathbf{Y}- \boldsymbol{\mu})\|^2] + \underbrace{\textsf{E}[\|\mathbf{H}^T(\mathbf{Y}- \boldsymbol{\mu})\|^2]} + \underbrace{{\text{cross-product}}} \\ & \hspace{4.35in} \ge 0 \quad + \hspace{1.25in} 0\\ & \ge \textsf{E}[\|\mathbf{P}(\mathbf{Y}- \boldsymbol{\mu})\|^2] \end{align*}\]
Cross-product is \(2\textsf{E}[(\mathbf{H}^T(\mathbf{Y}- \mu))^T\mathbf{P}(\mathbf{Y}- \boldsymbol{\mu})] = 0\) (see last slide)
If \(\mathbf{P}\mathbf{Y}= \hat{\boldsymbol{\mu}}\) is the BLUE of \(\boldsymbol{\mu}\), is \(\mathbf{B}\mathbf{P}\mathbf{Y}= \mathbf{B}\hat{\boldsymbol{\mu}}\) the BLUE of \(\mathbf{B}\boldsymbol{\mu}\)?
Yes! Similar proof as above to show that out of the class of LUEs \(\mathbf{A}\mathbf{Y}\) of \(\mathbf{B}\boldsymbol{\mu}\) where \(\mathbf{A}\in \mathbb{R}^{d \times n}\) that \[\textsf{E}[\|\mathbf{A}\mathbf{Y}- \mathbf{B}\boldsymbol{\mu}\|^2] \ge \textsf{E}[\|\mathbf{B}\mathbf{P}\mathbf{Y}- \mathbf{B}\boldsymbol{\mu}\|^2]\] with equality iff \(\mathbf{A}= \mathbf{B}\mathbf{P}\).
What about linear functionals of \(\boldsymbol{\beta}\), \(\boldsymbol{\Lambda}^T \boldsymbol{\beta}\), for \(\mathbf{X}\) rank \(r \le p\)?
If \(\boldsymbol{\Lambda}^T= \mathbf{B}\mathbf{X}\) for some matrix \(\mathbf{B}\) then
\(\textsf{E}[\mathbf{B}\mathbf{P}\mathbf{Y}] = \textsf{E}[\boldsymbol{\Lambda}^T \hat{\boldsymbol{\beta}}] = \boldsymbol{\Lambda}^T\boldsymbol{\beta}\)
The unique OLS estimate of \(\boldsymbol{\Lambda}^T\boldsymbol{\beta}\) is \(\boldsymbol{\Lambda}^T\hat{\boldsymbol{\beta}}\)
\(\mathbf{B}\mathbf{P}\mathbf{Y}= \boldsymbol{\Lambda}^T\hat{\boldsymbol{\beta}}\) is the BLUE of \(\boldsymbol{\Lambda}^T\boldsymbol{\beta}\) \[\begin{align*} & \textsf{E}[\|\mathbf{B}\mathbf{P}\mathbf{Y}- \mathbf{B}\boldsymbol{\mu}\|^2] \le \textsf{E}[\|\mathbf{A}\mathbf{Y}- \mathbf{B}\boldsymbol{\mu}\|^2] \\ \Leftrightarrow & \\ & \textsf{E}[\|\boldsymbol{\Lambda}^T\hat{\boldsymbol{\beta}}- \boldsymbol{\Lambda}^T\boldsymbol{\beta})\|^2] \le \textsf{E}[\|\mathbf{L}^T\tilde{\boldsymbol{\beta}}- \boldsymbol{\Lambda}^T\boldsymbol{\beta}\|^2] \end{align*}\] for LUE \(\mathbf{A}\mathbf{Y}= \mathbf{L}^T\tilde{\boldsymbol{\beta}}\) of \(\boldsymbol{\Lambda}^T\boldsymbol{\beta}\)
Proof proceeds as the classic case.
Let \(\mathbf{D}= \mathbf{H}\mathbf{P}\) and write \[\begin{align*} \textsf{E}[(\mathbf{H}^T(\mathbf{Y}- \mu))^T\mathbf{P}(\mathbf{Y}- \boldsymbol{\mu})] & = \textsf{E}[(\mathbf{Y}- \mu))^T\mathbf{H}\mathbf{P}(\mathbf{Y}- \boldsymbol{\mu})] \\ & = \textsf{E}[(\mathbf{Y}- \mu))^T\mathbf{D}(\mathbf{Y}- \boldsymbol{\mu})] \end{align*}\]
\[\begin{align*} \textsf{E}[(\mathbf{Y}- \mu))^T\mathbf{D}(\mathbf{Y}- \boldsymbol{\mu})] = & \textsf{E}[\textsf{tr}(\mathbf{Y}- \mu))^T\mathbf{D}(\mathbf{Y}- \boldsymbol{\mu}))] \\ = & \textsf{E}[\textsf{tr}(\mathbf{D}(\mathbf{Y}- \boldsymbol{\mu})(\mathbf{Y}- \mu)^T)] \\ = & \textsf{tr}(\textsf{E}[\mathbf{D}(\mathbf{Y}- \boldsymbol{\mu})(\mathbf{Y}- \mu)^T]) \\ = & \textsf{tr}(\mathbf{D}\textsf{E}[(\mathbf{Y}- \boldsymbol{\mu})(\mathbf{Y}- \mu)^T]) \\ = & \sigma^2 \textsf{tr}(\mathbf{D}\mathbf{I}_n)\\ \end{align*}\]
Since \(\textsf{tr}(\mathbf{D}) = \textsf{tr}(\mathbf{H}\mathbf{P}) = \textsf{tr}(\mathbf{P}\mathbf{H})\) we can conclude that the cross-product term is zero.