STA 721: Lecture 3
Duke University
Readings: - Christensen Chapter 2 and Appendix B - Seber & Lee Chapter 3
Model: \(\mathbf{Y}= \boldsymbol{\mu}+ \boldsymbol{\epsilon}\)
Assumption: \(\boldsymbol{\mu}\in C(\mathbf{X})\) for \(\mathbf{X}\in \mathbb{R}^{n \times p}\)
What if the rank of \(\mathbf{X}\), \(r(\mathbf{X}) \equiv r \ne p\)?
Still have result that the OLS/MLE solution satisfies \[\mathbf{P}_\mathbf{X}\mathbf{Y}= \mathbf{X}\hat{\boldsymbol{\beta}}\]
How can we characterize \(\mathbf{P}\) and \(\hat{\boldsymbol{\beta}}\) in this case? 2 cases
Focus on the first case for OLS/MLE for now…
\(\boldsymbol{{\cal M}}= C(\mathbf{X})\) is an \(r\)-dimensional subspace of \(\mathbb{R}^n\)
\(\boldsymbol{{\cal M}}\) has an \((n - r)\)-dimensional orthogonal complement \(\boldsymbol{{\cal N}}\)
each \(\mathbf{y}\in \mathbb{R}^n\) has a unique representation as \[ \mathbf{y}= \hat{\mathbf{y}}+ \mathbf{e}\] for \(\hat{\mathbf{y}}\in \boldsymbol{{\cal M}}\) and \(\mathbf{e}\in \boldsymbol{{\cal N}}\)
\(\hat{\mathbf{y}}\) is the orthogonal projection of \(\mathbf{y}\) onto \(\boldsymbol{{\cal M}}\) and is the OLS/MLE estimate of \(\boldsymbol{\mu}\) that satisfies \[\mathbf{P}_\mathbf{X}\mathbf{y}= \mathbf{X}\hat{\boldsymbol{\beta}}= \hat{\mathbf{y}}\]
\(\mathbf{X}^T\mathbf{X}\) is not invertible so need another way to represent \(\mathbf{P}_\mathbf{X}\) and \(\hat{\boldsymbol{\beta}}\)
Every symmetric \(n \times n\) matrix, \({\mathbf{S}}\), has an eigen decomposition \({\mathbf{S}}= \mathbf{U}\boldsymbol{\Lambda}\mathbf{U}^T\)
Exercise
Show that a symmetric matrix \({\mathbf{S}}\) is positive definite if and only if its eigenvalues are all strictly greater than zero, and positive semi-definite if all the eigenvalues are non-negative.
Let \(\mathbf{P}\) be an orthogonal projection matrix onto \(\boldsymbol{{\cal M}}\), then
the eigenvalues of \(\mathbf{P}\), \(\lambda_i\), are either zero or one
the trace of \(\mathbf{P}\) is the rank of \(\mathbf{P}\)
the dimension of the subspace that \(\mathbf{P}\) projects onto is the rank of \(\mathbf{P}\)
the columns of \(\mathbf{U}_r = [u_1, u_2, \ldots u_r]\) form an ONB for the \(C(\mathbf{P})\)
the projection \(\mathbf{P}\) has the representation \(\mathbf{P}= \mathbf{U}_r \mathbf{U}_r^T = \sum_{i = 1}^r u_i u_i^T\) (the sum of \(r\) rank \(1\) projections)
the projection \(\mathbf{I}_n - \mathbf{P}= \mathbf{I}- \mathbf{U}_r \mathbf{U}_r^T = \mathbf{U}_\perp \mathbf{U}_\perp^T\) where \(\mathbf{U}_\perp = [u_{r+1}, \ldots u_n]\) is an orthogonal projection onto \(\boldsymbol{{\cal N}}\)
MLE/OLS:
A matrix \(\mathbf{X}\in \mathbb{R}^{n \times p}\), \(p \le n\) has a singular value decomposition \[\mathbf{X}= \mathbf{U}_p \mathbf{D}\mathbf{V}^T\]
if \(\mathbf{X}^T\mathbf{X}\) is invertible, \(\mathbf{P}_X = \mathbf{X}(\mathbf{X}^T\mathbf{X})^{-1} \mathbf{X}^T\) and \(\hat{\boldsymbol{\beta}}\) is the unique estimator that satisfies \(\mathbf{P}_\mathbf{X}\mathbf{y}= \mathbf{X}\hat{\boldsymbol{\beta}}\) or \(\hat{\boldsymbol{\beta}}= (\mathbf{X}^T\mathbf{X})^{-1} \mathbf{X}^T\mathbf{y}\)
if \(\mathbf{X}^T\mathbf{X}\) is not invertible, replace \(\mathbf{X}\) by \(\tilde{\mathbf{X}}\) that is rank \(r\)
or represent \(\mathbf{P}_\mathbf{X}= \mathbf{X}(\mathbf{X}^T\mathbf{X})^{-} \mathbf{X}^T\) where \((\mathbf{X}^T\mathbf{X})^{-}\) is a generalized inverse of \(\mathbf{X}^T\mathbf{X}\) and \(\hat{\boldsymbol{\beta}}= (\mathbf{X}^T\mathbf{X})^{-}\mathbf{X}^T \mathbf{y}\)
Lemma B.43
If \(\mathbf{G}\) and \(\mathbf{H}\) are generalized inverses of \(\mathbf{X}^T\mathbf{X}\) then \[\begin{align*} \mathbf{X}\mathbf{G}\mathbf{X}^T \mathbf{X}& = \mathbf{X}\mathbf{H}\mathbf{X}^T \mathbf{X}= \mathbf{X}\\ \mathbf{X}\mathbf{G}\mathbf{X}^T & = \mathbf{X}\mathbf{H}\mathbf{X}^T \end{align*}\]
We need to show that (i) \(\mathbf{P}\mathbf{m}= \mathbf{m}\) for \(\mathbf{m}\in C(\mathbf{X})\) and (ii) \(\mathbf{P}\mathbf{n}= 0\) for \(\mathbf{n}\in C(\mathbf{X})^\perp\).
For \(\mathbf{m}\in C(\mathbf{X})\), write \(\mathbf{m}= \mathbf{X}\mathbf{b}\). Then \(\mathbf{P}\mathbf{m}= \mathbf{P}\mathbf{X}\mathbf{b}= \mathbf{X}(\mathbf{X}^T\mathbf{X})^-\mathbf{X}^T \mathbf{X}\mathbf{b}\) and by Lemma B43, we have that \(\mathbf{X}(\mathbf{X}^T\mathbf{X})^-\mathbf{X}^T \mathbf{X}\mathbf{b}= \mathbf{X}\mathbf{b}= \mathbf{m}\)
For \(\mathbf{n}\perp C(\mathbf{X})\), \(\mathbf{P}\mathbf{n}= \mathbf{X}(\mathbf{X}^T\mathbf{X})^-\mathbf{X}^T \mathbf{n}= \mathbf{0}_n\) as \(C(\mathbf{X})^\perp = N(\mathbf{X}^T)\).
MLE/OLS satisfies
Show that \(\mathbf{A}_{MP}^-\) is a generalized inverse of \(\mathbf{A}\)
Can you construct another generalized inverse of \(\mathbf{X}^T\mathbf{X}\) ?
Can you find the Moore-Penrose generalized inverse of \(\mathbf{X}\in \mathbb{R}^{n \times p}\)?
How good is \(\hat{\boldsymbol{\beta}}\) as an estimator of \(\beta\)
Class of linear statistical models: \[\begin{align*} \mathbf{Y}& = \mathbf{X}\boldsymbol{\beta}+ \boldsymbol{\epsilon}\\ \boldsymbol{\epsilon}& \sim P \\ P & \in \cal{P} \end{align*}\]
An estimator \(\tilde{\boldsymbol{\beta}}\) is unbiased for \(\boldsymbol{\beta}\) if \(\textsf{E}_P[\tilde{\boldsymbol{\beta}}] = \boldsymbol{\beta}\quad \forall \boldsymbol{\beta}\in \mathbb{R}^p\) and \(P \in \cal{P}\)
Examples:
\(\cal{P}_1= \{P = \textsf{N}(\mathbf{0}_n ,\mathbf{I}_n)\}\)
\(\cal{P}_2 = \{P = \textsf{N}(\mathbf{0}_n ,\sigma^2 \mathbf{I}_n), \sigma^2 >0\}\)
\(\cal{P}_3 = \{P = \textsf{N}(\mathbf{0}_n ,\boldsymbol{\Sigma}), \boldsymbol{\Sigma}\in \cal{{\cal{S}}}^+ \}\) (\(\cal{{\cal{S}}}^+\) is the set of all \(n \times n\) symmetric positive definite matrices.)
\(\cal{P}_4\) is the set of distributions with \(\textsf{E}_P[\boldsymbol{\epsilon}] = \mathbf{0}_n\) and \(\textsf{E}_P[\boldsymbol{\epsilon}\boldsymbol{\epsilon}^T] \gt 0\)
\(\cal{P}_5\) is the set of distributions with \(\textsf{E}_P[\boldsymbol{\epsilon}] = \mathbf{0}_n\) and \(\textsf{E}_P[\boldsymbol{\epsilon}\boldsymbol{\epsilon}^T] \ge 0\)
Exercise
Explain why an estimator that is unbiased for the model with parameter space \(\boldsymbol{\beta}\in \mathbb{R}^p\) and \(P \in \cal{P}_{k+1}\) is unbiased for the model with parameter space \(\boldsymbol{\beta}\in \mathbb{R}^p\) and \(P \in \cal{P}_{k}\) .
Find an estimator that is unbiased for \(\boldsymbol{\beta}\in \mathbb{R}^p\) and \(P \in \cal{P}_{1}\) that but is biased for \(\boldsymbol{\beta}\in \mathbb{R}^p\) and \(P \in \cal{P}_{2}\).
Restrict attention to linear unbiased estimators
An estimator \(\tilde{\boldsymbol{\beta}}\) is a Linear Unbiased Estimator (LUE) of \(\boldsymbol{\beta}\) if