Generalized Least Squares, BLUES & BUES

STA 721: Lecture 6

Merlise Clyde (clyde@duke.edu)

Duke University

Outline

  • General Least Squares and MLEs
  • Gauss-Markov Theorem & BLUEs
  • MVUE

Readings:

  • Christensen Chapter 2 and 10 (Appendix B as needed)

  • Seber & Lee Chapter 3

Other Error Distributions

Model:
\[\begin{align} \mathbf{Y}& = \mathbf{X}\boldsymbol{\beta}+ \boldsymbol{\epsilon}\quad \textsf{E}[\boldsymbol{\epsilon}] = \mathbf{0}_n \\ \textsf{Cov}[\boldsymbol{\epsilon}] & = \sigma^2 \mathbf{V} \end{align}\] where \(\sigma^2\) is a scalar and \(\mathbf{V}\) is a \(n \times n\) symmetric matrix

Examples:

  • Heteroscedasticity: \(\mathbf{V}\) is a diagonal matrix with \([\mathbf{V}]_{ii} = v_i\)
    • \(v_{i} = 1/n_i\) if \(y_i\) is the mean of \(n_i\) observations
    • survey weights or propogation of measurement errors in physics models
  • Correlated data:
    • time series; first order auto-regressive model with equally spaced data \(\textsf{Cov}[\boldsymbol{\epsilon}] = \sigma^2 \mathbf{V}\), where \(v_{ij} = \rho^{|i−j|}\).
  • Hierarchical models with random effects

OLS under a General Covariance

  • Is it still unbiased? What’s its variance? Is it still the BLUE?

  • Unbiasedness of \(\hat{\boldsymbol{\beta}}\) \[\begin{align} \textsf{E}[\hat{\boldsymbol{\beta}}] & = \textsf{E}[(\mathbf{X}^T\mathbf{X})^{-1} \mathbf{X}^T \mathbf{Y}] \\ & = (\mathbf{X}^T\mathbf{X})^{-1} \mathbf{X}^T \textsf{E}[\mathbf{Y}] = (\mathbf{X}^T\mathbf{X})^{-1} \mathbf{X}^T \textsf{E}[\mathbf{X}\boldsymbol{\beta}+ \boldsymbol{\epsilon}] \\ & = \boldsymbol{\beta}+ \mathbf{0}_p = \boldsymbol{\beta} \end{align}\]

  • Covariance of \(\hat{\boldsymbol{\beta}}\) \[\begin{align} \textsf{Cov}[\hat{\boldsymbol{\beta}}] & = \textsf{Cov}[(\mathbf{X}^T\mathbf{X})^{-1} \mathbf{X}^T \mathbf{Y}] \\ & = (\mathbf{X}^T\mathbf{X})^{-1} \mathbf{X}^T \textsf{Cov}[\mathbf{Y}] \mathbf{X}(\mathbf{X}^T\mathbf{X})^{-1} \\ & = \sigma^2 (\mathbf{X}^T\mathbf{X})^{-1} \mathbf{X}^T \mathbf{V}\mathbf{X}(\mathbf{X}^T\mathbf{X})^{-1} \end{align}\]

  • Not necessarily \(\sigma^2 (\mathbf{X}^T\mathbf{X})^{-1}\) unless \(\mathbf{V}\) has a special form

GLS via Whitening

Transform the data and reduce problem to one we have solved!

  • For \(\mathbf{V}> 0\) use the Spectral Decomposition \[\mathbf{V}= \mathbf{U}\boldsymbol{\Lambda}\mathbf{U}^T = \mathbf{U}\boldsymbol{\Lambda}^{1/2} \boldsymbol{\Lambda}^{1/2} \mathbf{U}^T\]

  • define the symmetric square root of \(\mathbf{V}\) as \[\mathbf{V}^{1/2} \equiv \mathbf{U}\boldsymbol{\Lambda}^{1/2} \mathbf{U}^T\]

  • transform model: \[\begin{align*} \mathbf{V}^{-1/2} \mathbf{Y}& = \mathbf{V}^{-1/2} \mathbf{X}\boldsymbol{\beta}+ \mathbf{V}^{-1/2}\boldsymbol{\epsilon}\\ \tilde{\mathbf{Y}} & = \tilde{\mathbf{X}} \boldsymbol{\beta}+ \tilde{\boldsymbol{\epsilon}} \end{align*}\]

  • Since \(\textsf{Cov}[\tilde{\boldsymbol{\epsilon}}] = \sigma^2\mathbf{V}^{-1/2} \mathbf{V}\mathbf{V}^{-1/2} = \sigma^2 \mathbf{I}_n\), we know that \(\hat{\boldsymbol{\beta}}_\mathbf{V}\equiv (\tilde{\mathbf{X}}^T\tilde{\mathbf{X}})^{-1} \tilde{\mathbf{X}}^T\tilde{\mathbf{Y}}\) is the BLUE for \(\boldsymbol{\beta}\) based on \(\tilde{\mathbf{Y}}\) (\(\mathbf{X}\) full rank)

GLS

  • If \(\mathbf{V}\) is known, then \(\tilde{\mathbf{Y}}\) and \(\mathbf{Y}\) are known linear transformations of each other

  • any estimator of \(\boldsymbol{\beta}\) that is linear in \(\mathbf{Y}\) is linear in \(\tilde{\mathbf{Y}}\) and vice versa from previous results

  • \(\hat{\boldsymbol{\beta}}_\mathbf{V}\) is the BLUE of \(\boldsymbol{\beta}\) based on either \(\tilde{\mathbf{Y}}\) or \(\mathbf{Y}\)!

  • Substituting back, we have \[\begin{align} \hat{\boldsymbol{\beta}}_\mathbf{V}& = (\tilde{\mathbf{X}}^T\tilde{\mathbf{X}})^{-1} \tilde{\mathbf{X}}^T\tilde{\mathbf{Y}}\\ & = (\mathbf{X}^T \mathbf{V}^{-1/2}\mathbf{V}^{-1/2} \mathbf{X})^{-1} \mathbf{X}^T\mathbf{V}^{-1/2}\mathbf{V}^{-1/2}\mathbf{Y}\\ & = (\mathbf{X}^T \mathbf{V}^{-1} \mathbf{X})^{-1} \mathbf{X}^T\mathbf{V}^{-1}\mathbf{Y} \end{align}\] which is the Generalized Least Squares Estimator of \(\boldsymbol{\beta}\)

Exercise: Weighted Regression
Consider the model \(\mathbf{Y}= \beta \mathbf{x}+ \boldsymbol{\epsilon}\) where \(\textsf{Cov}[\boldsymbol{\epsilon}]\) is a known diagonal matrix \(\mathbf{V}\). Write out the GLS estimator in terms of sums and interpret.

GLS of \(\boldsymbol{\mu}\) (Full Rank Case)\(^{\dagger}\)

  • the OLS/MLE of \(\boldsymbol{\mu}\in C(\mathbf{X})\) with transformed variables is \[\begin{align*} \mathbf{P}_{\tilde{\mathbf{X}}} \tilde{\mathbf{Y}}& = \tilde{\mathbf{X}}\hat{\boldsymbol{\beta}}_\mathbf{V}\\ \tilde{\mathbf{X}}\left(\tilde{\mathbf{X}}^T\tilde{\mathbf{X}}\right)^{-1}\tilde{\mathbf{X}}^T \tilde{\mathbf{Y}}& = \tilde{\mathbf{X}}\hat{\boldsymbol{\beta}}_\mathbf{V}\\ \mathbf{V}^{-1/2} \mathbf{X}\left(\mathbf{X}^T\mathbf{V}^{-1}\mathbf{X}\right)^{-1}\mathbf{X}^T \mathbf{V}^{-1} \mathbf{Y}& = \mathbf{V}^{-1/2} \mathbf{X}\hat{\boldsymbol{\beta}}_\mathbf{V}\end{align*}\]

  • since \(\mathbf{V}\) is positive definite, multiple thru by \(\mathbf{V}^{1/2}\), to show that \(\hat{\boldsymbol{\beta}}_\mathbf{V}\) is a GLS/MLE estimator of \(\boldsymbol{\beta}\) iff \[\mathbf{X}\left(\mathbf{X}^T\mathbf{V}^{-1}\mathbf{X}\right)^{-1}\mathbf{X}^T \mathbf{V}^{-1} \mathbf{Y}= \mathbf{X}\hat{\boldsymbol{\beta}}_\mathbf{V}\]

  • Is \(\mathbf{P}_\mathbf{V}\equiv \mathbf{X}\left(\mathbf{X}^T\mathbf{V}^{-1}\mathbf{X}\right)^{-1}\mathbf{X}^T \mathbf{V}^{-1}\) a projection onto \(C(\mathbf{X})\)? Is it an orthogonal projection onto \(C(\mathbf{X})\)?

Projections

We want to show that \(\mathbf{P}_\mathbf{V}\equiv \mathbf{X}\left(\mathbf{X}^T\mathbf{V}^{-1}\mathbf{X}\right)^{-1}\mathbf{X}^T \mathbf{V}^{-1}\) is a projection onto \(C(\mathbf{X})\)

  • from the definition of \(\mathbf{P}_\mathbf{V}\) it follows that \(\mathbf{m}\in C(\mathbf{P}_\mathbf{v})\) implies that \(\mathbf{m}= \mathbf{P}_\mathbf{V}\mathbf{m}= \mathbf{X}\left(\mathbf{X}^T\mathbf{V}^{-1}\mathbf{X}\right)^{-1}\mathbf{X}^T\mathbf{m}\) so \(C(\mathbf{P}_\mathbf{V}) \subset C(\mathbf{X})\)

  • since \(\mathbf{P}_\tilde{\mathbf{X}}\) is a projection onto \(C(\tilde{\mathbf{X}})\) we have \[\begin{align*} \mathbf{P}_{\tilde{\mathbf{X}}} \tilde{\mathbf{X}}& = \tilde{\mathbf{X}}\\ \tilde{\mathbf{X}}\left(\tilde{\mathbf{X}}^T\tilde{\mathbf{X}}\right)^{-1}\tilde{\mathbf{X}}^T \tilde{\mathbf{X}}& = \tilde{\mathbf{X}}\\ \mathbf{V}^{-1/2} \mathbf{X}\left(\mathbf{X}^T\mathbf{V}^{-1}\mathbf{X}\right)^{-1}\mathbf{X}^T \mathbf{V}^{-1} \mathbf{X}& = \mathbf{V}^{-1/2} \mathbf{X}\\ \mathbf{V}^{-1/2} \mathbf{P}_\mathbf{V}\mathbf{X}& = \mathbf{V}^{-1/2} \mathbf{X} \end{align*}\]

  • We can multiply both sides by \(\mathbf{V}^{1/2} > 0\), so that \(\mathbf{P}_\mathbf{V}\mathbf{X}= \mathbf{X}\)

  • for \(\mathbf{m}\in C(\mathbf{X})\), \(\mathbf{P}_\mathbf{V}\mathbf{m}= \mathbf{m}\) and \(C(\mathbf{X}) \subset C(\mathbf{P}_\mathbf{V})\)

  • \(\quad \quad \therefore C(\mathbf{P}_\mathbf{V}) = C(\mathbf{X})\) so that \(\mathbf{P}_\mathbf{V}\) is a projection onto \(C(\mathbf{X})\)

Oblique Projections

Proposition: Projection
The \(n \times n\) matrix \(\mathbf{P}_\mathbf{V}\equiv \mathbf{X}\left(\mathbf{X}^T\mathbf{V}^{-1}\mathbf{X}\right)^{-1}\mathbf{X}^T \mathbf{V}^{-1}\) is a projection onto the \(C(\mathbf{X})\)

  • Show that \(\mathbf{P}_\mathbf{V}^2 = \mathbf{P}_\mathbf{V}\) (idempotent)

  • every vector \(\mathbf{y}\in \mathbb{R}^n\) may be written as \(\mathbf{y}= \mathbf{m}+ \mathbf{n}\) where \(\mathbf{P}_\mathbf{v}\mathbf{y}= \mathbf{m}\) and \((\mathbf{I}_n - \mathbf{P}_\mathbf{v})\mathbf{y}= \mathbf{n}\) where \(\mathbf{m}\in C(\mathbf{P}_\mathbf{V})\) and \(\mathbf{u}\in N(\mathbf{P}_\mathbf{V})\)

  • Is \(\mathbf{P}_\mathbf{V}\) an orthogonal projection onto \(C(\mathbf{X})\) for the inner product space \((\mathbb{R}^n, \langle \mathbf{v}, \mathbf{u}\rangle = \mathbf{v}^T\mathbf{u})\)?

Definition: Oblique Projection
For the inner product space \((\mathbb{R}^n, \langle \mathbf{v}, \mathbf{u}\rangle = \mathbf{v}^T\mathbf{u})\), a projection \(\mathbf{P}\) that is not an orthogonal projection is called an oblique projection

Loss Function

The GLS estimator minimizes the following generalized squared error loss: \[\begin{align} \| \tilde{\mathbf{Y}}- \tilde{\mathbf{X}}\boldsymbol{\beta}\|^2 & = (\tilde{\mathbf{Y}}- \tilde{\mathbf{X}}\boldsymbol{\beta})^T(\tilde{\mathbf{Y}}- \tilde{\mathbf{X}}\boldsymbol{\beta}) \\ & = (\mathbf{Y}- \mathbf{X}\boldsymbol{\beta})^T \mathbf{V}^{-1/2}\mathbf{V}^{-1/2}(\mathbf{Y}- \mathbf{X}\boldsymbol{\beta}) \\ & = (\mathbf{Y}- \mathbf{X}\boldsymbol{\beta})^T \mathbf{V}^{-1}(\mathbf{Y}- \mathbf{X}\boldsymbol{\beta}) \\ & = \| \mathbf{Y}- \mathbf{X}\boldsymbol{\beta}\|^2_{\mathbf{V}^{-1}} \end{align}\] where we can change the inner product to be \[\langle \mathbf{u}, \mathbf{v}\rangle_{\mathbf{V}^{-1}} \equiv \mathbf{u}^T\mathbf{V}^{-1} \mathbf{v}\]

Orthogonality in an Inner Product Space

Definition: Orthogonal Projecton
For an inner product space, (\(\mathbb{R}^n, \langle , \rangle\)). The projection \(\mathbf{P}\) is an orthogonal projection if for every vector \(\mathbf{x}\) and \(\mathbf{y}\) in \(\mathbb{R}^n\), \[ \langle \mathbf{P}\mathbf{x}, (\mathbf{I}_n -\mathbf{P})\mathbf{y}\rangle = \langle (\mathbf{I}_n - \mathbf{P}) \mathbf{x},\mathbf{P}\mathbf{y}\rangle = 0 \] Equivalently: \[ \langle \mathbf{x},\mathbf{P}\mathbf{y}\rangle = \langle \mathbf{P}\mathbf{x}, \mathbf{P}\mathbf{y}\rangle =\langle \mathbf{P}\mathbf{x},\mathbf{y}\rangle \]

Exercise
Show that \(\mathbf{P}_\mathbf{V}\) is an orthogonal projection under the inner product \(\langle \mathbf{x}, \mathbf{y}\rangle_{\mathbf{V}^{-1}} \equiv \mathbf{x}^T\mathbf{V}^{-1} \mathbf{y}\)

Variance of GLS

  • Variance of the GLS estimator \(\hat{\boldsymbol{\beta}}_\mathbf{V}= (\mathbf{X}^T\mathbf{V}^{−1}\mathbf{X})^{−1}\mathbf{X}^T\mathbf{V}^{−1}\mathbf{Y}\) is much simpler \[\begin{align} \textsf{Cov}[\hat{\boldsymbol{\beta}}_\mathbf{V}] & = (\mathbf{X}^T\mathbf{V}^{−1}\mathbf{X})^{−1}\mathbf{X}^T\mathbf{V}^{−1}\textsf{Cov}[\mathbf{Y}]\mathbf{V}^{−1}\mathbf{X}(\mathbf{X}^T\mathbf{V}^{−1}\mathbf{X})^{−1} \\ & = (\mathbf{X}^T\mathbf{V}^{−1}\mathbf{X})^{−1}\mathbf{X}^T\mathbf{V}^{−1}\mathbf{V}\mathbf{V}^{−1}\mathbf{X}(\mathbf{X}^T\mathbf{V}^{−1}\mathbf{X})^{−1} \\ & = \sigma^2(\mathbf{X}^T\mathbf{V}^{−1}\mathbf{X})^{−1}(\mathbf{X}^T\mathbf{V}^{−1}\mathbf{X})(\mathbf{X}^T\mathbf{V}^{−1}\mathbf{X})^{−1} \\ & = \sigma^2(\mathbf{X}^T\mathbf{V}^{−1}\mathbf{X})^{−1} \end{align}\]

Theorem: Gauss-Markov-Aitkin
Let \(\tilde{\boldsymbol{\beta}}\) be a linear unbiased estimator of \(\boldsymbol{\beta}\) and \(\hat{\boldsymbol{\beta}}_\mathbf{V}\) be the GLS estimator of \(\boldsymbol{\beta}\) in the linear model \(\mathbf{Y}= \mathbf{X}\boldsymbol{\beta}+ \boldsymbol{\epsilon}\) with \(\textsf{E}[\boldsymbol{\epsilon}] = \mathbf{0}_n\) and \(\textsf{Cov}[\boldsymbol{\epsilon}] = \sigma^2 \mathbf{V}\) with \(\mathbf{X}\) and \(\mathbf{V}>0\) known. Then \(\hat{\boldsymbol{\beta}}_\mathbf{V}\) is the BLUE where \[\textsf{Cov}[\tilde{\boldsymbol{\beta}}] \ge \sigma^2 (\mathbf{X}^T\mathbf{V}^{-1} \mathbf{X})^{-1} = \textsf{Cov}[\hat{\boldsymbol{\beta}}_\mathbf{V}] \]

When will OLS and GLS be Equal?

  • For what covariance matrices \(\mathbf{V}\) will the OLS and GLS estimators be the same?

  • Figuring this out can help us understand why the GLS estimator has a lower variance in general.

Theorem
The estimators \(\hat{\boldsymbol{\beta}}\) (OLS) and \(\hat{\boldsymbol{\beta}}_\mathbf{V}\) (GLS) are the same for all \(\mathbf{Y}\in \mathbb{R}^n\) iff \[\mathbf{V}= \mathbf{X}\boldsymbol{\Psi}\mathbf{X}^T + \mathbf{H}\boldsymbol{\Phi}\mathbf{H}^T\] for some positive definite matrices \(\boldsymbol{\Psi}\) and \(\boldsymbol{\Phi}\) and a matrix \(\mathbf{H}\) such that \(\mathbf{H}^T \mathbf{X}= \mathbf{0}\).

Outline of Proof

We need to show that \(\hat{\boldsymbol{\beta}}\) and \(\hat{\boldsymbol{\beta}}_\mathbf{V}\) are the same for all \(\mathbf{Y}\). Since both \(\mathbf{P}\) and \(\mathbf{P}_\mathbf{V}\) are projections onto \(C(\mathbf{X})\), \(\hat{\boldsymbol{\beta}}\) and \(\hat{\boldsymbol{\beta}}_\mathbf{V}\) will be the same iff \(\mathbf{P}_\mathbf{V}\) is an orthogonal projection onto \(C(\mathbf{X})\) so that \(\mathbf{P}_\mathbf{V}\mathbf{n}= 0\) for \(\mathbf{n}\in C(\mathbf{X})^\perp\) (they have the same null spaces)

  1. Show that \(C(\mathbf{X}) = C(\mathbf{V}\mathbf{X})\) iff \(\mathbf{V}\) can be written as \[\mathbf{V}= \mathbf{X}\boldsymbol{\Psi}\mathbf{X}^T + \mathbf{H}\boldsymbol{\Phi}\mathbf{H}^T\] (Show \(C(\mathbf{V}\mathbf{X}) \subset C( \mathbf{X})\) iff \(\mathbf{V}\) has the above form and since the two subspaces have the same rank \(C(\mathbf{X}) = C(\mathbf{V}\mathbf{X})\)

  2. Show that \(C(\mathbf{X}) = C(\mathbf{V}^{-1} \mathbf{X})\) iff \(C(\mathbf{X}) = C(\mathbf{V}\mathbf{X})\)

  3. Show that \(C(\mathbf{X})^\perp = C(\mathbf{V}^{-1} \mathbf{X})^\perp\) iff \(C(\mathbf{X}) = C(\mathbf{V}^{-1} \mathbf{X})\)

  4. Show that \(\mathbf{n}\in C(\mathbf{X})^\perp\) iff \(\mathbf{n}\in C(\mathbf{V}^{-1}\mathbf{X})^\perp\) so \(\mathbf{P}_\mathbf{V}\mathbf{n}= 0\)

See Proposition 2.7.5 and Proof in Christensen

Some Intuition

For the linear model \(\mathbf{Y}= \mathbf{X}\boldsymbol{\beta}+ \boldsymbol{\epsilon}\) with \(\textsf{E}[\boldsymbol{\epsilon}] = \mathbf{0}_n\) and \(\textsf{Cov}[\boldsymbol{\epsilon}] = \sigma^2 \mathbf{V}\), we can always write

\[\begin{align} \boldsymbol{\epsilon}& = \mathbf{P}\boldsymbol{\epsilon}+ (\mathbf{I}- \mathbf{P})\boldsymbol{\epsilon}\\ & = \boldsymbol{\epsilon}_\mathbf{X}+ \boldsymbol{\epsilon}_N \end{align}\]

  • we can recover \(\boldsymbol{\epsilon}_N\) from the data \(\mathbf{Y}\) but not \(\boldsymbol{\epsilon}_\mathbf{X}\): \[\begin{align} \mathbf{P}\mathbf{Y}& = \mathbf{P}( \mathbf{X}\boldsymbol{\beta}+ \boldsymbol{\epsilon}_\mathbf{X}+ \boldsymbol{\epsilon}_n )\\ & = \mathbf{X}\boldsymbol{\beta}+ \boldsymbol{\epsilon}_\mathbf{X}= \mathbf{X}\hat{\boldsymbol{\beta}}\\ (\mathbf{I}_n - \mathbf{P}) \mathbf{Y}& = \boldsymbol{\epsilon}_N = \hat{\boldsymbol{\epsilon}} = \mathbf{e} \end{align}\]

  • Can \(\boldsymbol{\epsilon}_\textsf{N}\) help us estimate \(\mathbf{X}\boldsymbol{\beta}\)? What if \(\boldsymbol{\epsilon}_N\) could tell us something about \(\boldsymbol{\epsilon}_X\)?

  • Yes if they were highly correlated! But if they were independent or uncorrelated then knowing \(\boldsymbol{\epsilon}_\textsf{N}\) doesn’t help us!

Intuition Continued

  • For what matrices are \(\boldsymbol{\epsilon}_\mathbf{X}\) and \(\boldsymbol{\epsilon}_N\) uncorrelated?

  • Under \(\mathbf{V}= \mathbf{I}_n\): \[\begin{align} \textsf{E}[\boldsymbol{\epsilon}_X \boldsymbol{\epsilon}_N] & = \mathbf{P}\textsf{E}[\boldsymbol{\epsilon}\boldsymbol{\epsilon}^T](\mathbf{I}-\mathbf{P}) \\ & = \sigma^2 \mathbf{P}(\mathbf{I}- \mathbf{P}) = \mathbf{0} \end{align}\] so they are uncorrelated

  • For the \(\mathbf{V}\) in the theorem, introduce

    • \(\mathbf{Z}_\mathbf{X}\) where \(\textsf{E}[\mathbf{Z}_\mathbf{X}]= \mathbf{0}_n\) and \(\textsf{Cov}[\mathbf{Z}_\mathbf{X}] = \boldsymbol{\Psi}\)
    • \(\mathbf{Z}_\textsf{N}\) where \(\textsf{E}[\mathbf{Z}_\textsf{N}]= \mathbf{0}_n\) and \(\textsf{Cov}[\mathbf{Z}_\textsf{N}] = \boldsymbol{\Phi}\)
    • \(\mathbf{Z}_\mathbf{X}\) and \(\mathbf{Z}_\textsf{N}\) are uncorrelated, \(\textsf{E}[\mathbf{Z}_\mathbf{X}\mathbf{Z}_\textsf{N}] = \mathbf{0}\)
    • \(\boldsymbol{\epsilon}= \mathbf{X}\mathbf{Z}_\mathbf{X}+ \mathbf{H}\mathbf{Z}_\textsf{N}\) so that \(\boldsymbol{\epsilon}\) has the desired mean and covariance \(\mathbf{V}\) in the theorem

Intuition Continued

As a consequence we have

  • \(\boldsymbol{\epsilon}_\mathbf{X}= \mathbf{P}\boldsymbol{\epsilon}= \mathbf{X}\mathbf{Z}_\mathbf{X}\)

  • \(\boldsymbol{\epsilon}_\textsf{N}= (\mathbf{I}_n - \mathbf{P})\boldsymbol{\epsilon}= \mathbf{H}\mathbf{Z}_\textsf{N}\)

  • \(\boldsymbol{\epsilon}_\mathbf{X}\) and \(\boldsymbol{\epsilon}_\textsf{N}\) are uncorrelated \[\begin{align} \textsf{E}[\boldsymbol{\epsilon}_\mathbf{X}\boldsymbol{\epsilon}_\textsf{N}] & = \textsf{E}[\mathbf{X}\mathbf{Z}_\mathbf{X}\mathbf{Z}_\textsf{N}^T \mathbf{H}^T] \\ & = \mathbf{X}\mathbf{0}\mathbf{H}^T \\ & = \mathbf{0} \end{align}\]

  • so that \(\boldsymbol{\epsilon}_\mathbf{X}\) and \(\boldsymbol{\epsilon}_\textsf{N}\) are uncorrelated with \(\mathbf{V}= \mathbf{X}\boldsymbol{\Psi}\mathbf{X}^T + \mathbf{H}\boldsymbol{\Phi}\mathbf{H}\) ^T$

  • Alternative Statement of Theorem: \(\hat{\boldsymbol{\beta}}= \hat{\boldsymbol{\beta}}_\mathbf{V}\) for all \(\mathbf{Y}\) under \(\textsf{Cov}[\mathbf{Y}] = \sigma^2 \mathbf{V}\) iff \(\mathbf{P}\mathbf{Y}\) and \((\mathbf{I}- \mathbf{P})\mathbf{Y}\) are uncorrelated

Equivalence of GLS estimators

The following corollary to the theorem establishes when two GLS estimators for different \(\textsf{Cov}[\boldsymbol{\epsilon}]\) are equivalent :

Corollary
Suppose \(\mathbf{V}= \mathbf{X}\boldsymbol{\Psi}\mathbf{X}^ T + \boldsymbol{\Phi}\mathbf{H}\boldsymbol{\Omega}\mathbf{H}^T \boldsymbol{\Phi}\). Then \(\hat{\boldsymbol{\beta}}_\mathbf{V}= \hat{\boldsymbol{\beta}}_\boldsymbol{\Phi}\)

  • Can you construct an equivalent representation based on zero correlation of \(\mathbf{P}_\boldsymbol{\Phi}\mathbf{Y}\) and \((\mathbf{I}_n - \mathbf{P}_\boldsymbol{\Phi})\mathbf{Y}\) when \(\textsf{Cov}[\boldsymbol{\epsilon}] = \sigma^2 \mathbf{V}?\)