STA 721: Lecture 14
Duke University
Hypothesis Testing:
Readings:
We assume the Gaussian Linear Model \[\quad \quad \quad \quad \quad \quad \quad \quad \text{M1} \quad \mathbf{Y}∼ \textsf{N}(\mathbf{W}\boldsymbol{\alpha}+ \mathbf{X}\boldsymbol{\beta}, \sigma^2\mathbf{I}) \equiv \textsf{N}(\mathbf{Z}\boldsymbol{\theta}, \sigma^2\mathbf{I})\] where \(\mathbf{W}\) is \(n \times q\), \(\mathbf{X}\) is \(n \times p\), \(\mathbf{Z}= [\mathbf{W}\mathbf{X}]\),
We wish to evaluate the hypothesis \(\boldsymbol{\beta}= \mathbf{0}\)
equivalent to comparing M1 to M0: \[\text{M0} \quad \mathbf{Y}∼ \textsf{N}(\mathbf{W}\boldsymbol{\alpha}, \sigma^2\mathbf{I})\]
\(\textsf{SSE}_{M0}/(n-q)\) and \(\textsf{SSE}_{M1}/(n- q - p)\) are unbiased estimates of \(\sigma^2\) under null model M0
but the ratio \(\frac{\textsf{SSE}_{M0}/(n-q)}{\textsf{SSE}_{M1}/(n- q - p)}\) does not have a F distribution
Rewrite \(\textsf{SSE}_{M0}\): \[\begin{align*} \textsf{SSE}_{M0} & = \mathbf{Y}^T(\mathbf{I}- \mathbf{P}_{\mathbf{W}})\mathbf{Y}\\ & = \mathbf{Y}^T(\mathbf{I}- \mathbf{P}_{\mathbf{Z}} + \mathbf{P}_{\mathbf{Z}} - \mathbf{P}_{\mathbf{W}})\mathbf{Y}\\ & = \mathbf{Y}^T(\mathbf{I}- \mathbf{P}_{\mathbf{Z}})\mathbf{Y}+ \mathbf{Y}^T(\mathbf{P}_{\mathbf{Z}} - \mathbf{P}_{\mathbf{W}})\mathbf{Y}\\ & = \textsf{SSE}_{M1} + \mathbf{Y}^T(\mathbf{P}_{\mathbf{Z}} - \mathbf{P}_{\mathbf{W}})\mathbf{Y} \end{align*}\]
Extra Sum of Squares: \[\textsf{SSE}_{M0} - \textsf{SSE}_{M1} = \mathbf{Y}^T(\mathbf{P}_{\mathbf{Z}} - \mathbf{P}_{\mathbf{W}})\mathbf{Y}\]
\(\textsf{E}[\textsf{SSE}_{M0} - \textsf{SSE}_{M1}] = \textsf{E}[\mathbf{Y}^T(\mathbf{P}_{\mathbf{Z}} - \mathbf{P}_{\mathbf{W}})\mathbf{Y}]\)
under M0: \(\boldsymbol{\mu}= \mathbf{W}\boldsymbol{\alpha}\) \[\begin{align*} \textsf{E}[(\mathbf{P}_{\mathbf{Z}} - \mathbf{P}_{\mathbf{W}})\mathbf{Y}] & = \mathbf{P}_{\mathbf{Z}}\mathbf{W}\boldsymbol{\alpha}- \mathbf{P}_{\mathbf{W}}\mathbf{W}\boldsymbol{\alpha}\\ & = \mathbf{W}\boldsymbol{\alpha}\mathbf{W}_\boldsymbol{\alpha}\\ & = \mathbf{0}\\ \textsf{E}[\mathbf{Y}^T(\mathbf{P}_{\mathbf{Z}} - \mathbf{P}_{\mathbf{W}})\mathbf{Y}] & = \sigma^2(\textsf{tr}\mathbf{P}_\mathbf{Z}+ \textsf{tr}\mathbf{P}_\mathbf{W}) \\ & = \sigma^2 (q + p - q) = p \sigma^2 \end{align*}\]
under M1: \(\boldsymbol{\mu}= \mathbf{X}\boldsymbol{\beta}+ \mathbf{W}\boldsymbol{\alpha}\) \[\begin{align*} \textsf{E}[(\mathbf{P}_{\mathbf{Z}} - \mathbf{P}_{\mathbf{W}})\mathbf{Y}] & = \mathbf{X}\boldsymbol{\beta}+ \mathbf{W}\boldsymbol{\alpha}- \mathbf{P}_{\mathbf{W}}\mathbf{X}\boldsymbol{\beta}- \mathbf{W}\boldsymbol{\alpha}\\ & = (\mathbf{I}- \mathbf{P}_{\mathbf{W}})\mathbf{X}\boldsymbol{\beta}\\ \textsf{E}[\mathbf{Y}^T(\mathbf{P}_{\mathbf{Z}} - \mathbf{P}_{\mathbf{W}})\mathbf{Y}] & = p \sigma^2 + \boldsymbol{\beta}^T\mathbf{X}^T(\mathbf{I}- \mathbf{P}_{\mathbf{W}})\mathbf{X}\boldsymbol{\beta} \end{align*}\]
Propose ratio: \[F = \frac{(\textsf{SSE}_{M0} - \textsf{SSE}_{M1})/p} {\textsf{SSE}_{M1}/(n - q - p)} = \frac{\mathbf{Y}^T(\mathbf{P}_{\mathbf{Z}} - \mathbf{P}_{\mathbf{W}})\mathbf{Y}/p}{\textsf{SSE}_{M1}/(n - q - p)}\] as a test statistic.
Does \(F\) have an F distribution under M0?
To show that \(\mathbf{Y}^T(\mathbf{P}_{\mathbf{Z}} - \mathbf{P}_{\mathbf{W}})\mathbf{Y}\) has a \(\chi^2\) distribution under M0 or M1, we need to show that \(\mathbf{P}_{\mathbf{Z}} - \mathbf{P}_{\mathbf{W}}\) is a projection matrix.
\[\begin{align*} (\mathbf{P}_{\mathbf{Z}} - \mathbf{P}_{\mathbf{W}})^2 & = \mathbf{P}_{\mathbf{Z}}^2 - \mathbf{P}_{\mathbf{Z}}\mathbf{P}_{\mathbf{W}} - \mathbf{P}_{\mathbf{W}}\mathbf{P}_{\mathbf{Z}} + \mathbf{P}_{\mathbf{W}}^2 \\ & = \mathbf{P}_{\mathbf{Z}} - \mathbf{P}_{\mathbf{Z}}\mathbf{P}_{\mathbf{W}} - \mathbf{P}_{\mathbf{W}}\mathbf{P}_{\mathbf{Z}} + \mathbf{P}_{\mathbf{W}} \\ & = \mathbf{P}_{\mathbf{Z}} - \mathbf{P}_{\mathbf{Z}}\mathbf{P}_{\mathbf{W}} - (\mathbf{P}_{\mathbf{Z}}\mathbf{P}_{\mathbf{W}})^T + \mathbf{P}_{\mathbf{W}} \\ & = \mathbf{P}_{\mathbf{Z}} - 2\mathbf{P}_{\mathbf{W}} + \mathbf{P}_{\mathbf{W}} \\ & = \mathbf{P}_{\mathbf{Z}} - \mathbf{P}_{\mathbf{W}} \end{align*}\]
So \(\mathbf{P}_{\mathbf{Z}} - \mathbf{P}_{\mathbf{W}}\) is a projection matrix
Onto what space is it projecting?
Intuitively, it is projecting onto the part of \(\mathbf{X}\) that is not in \(\mathbf{W}\), \(\tilde{\mathbf{X}}= (\mathbf{I}- \mathbf{P}_{\mathbf{W}})\mathbf{X}\) (the part of \(\mathbf{X}\) that is orthogonal to \(\mathbf{W}\))
\(C(\tilde{\mathbf{X}})\) and \(C(\mathbf{W})\) are complementary orthogonal subspaces of \(C(\mathbf{Z})\)
\(\mathbf{P}_{\mathbf{Z}} - \mathbf{P}_{\mathbf{W}}\) is a projection matrix onto \(C(\tilde{\mathbf{X}})\) along \(C(\mathbf{W})\)
we are decomposing \(C(\mathbf{Z})\) into two orthogonal subspaces \(C(\mathbf{W})\) and \(C(\tilde{\mathbf{X}})\)
We can write \(\mathbf{P}_{\mathbf{Z}} = \mathbf{P}_{\tilde{\mathbf{X}}} + \mathbf{P}_{\mathbf{W}}\) where \(\mathbf{P}_{\tilde{\mathbf{X}}} \mathbf{P}_{\mathbf{W}} = \mathbf{P}_{\mathbf{W}} \mathbf{P}_{\tilde{\mathbf{X}}} = \mathbf{0}\)
Note: we can always write \[\begin{align*} \boldsymbol{\mu}& = \mathbf{W}\boldsymbol{\alpha}+ \mathbf{X}\boldsymbol{\beta}\\ & = \mathbf{W}\boldsymbol{\alpha}+ (\mathbf{I}- \mathbf{P}_{\mathbf{W}})\mathbf{X}\boldsymbol{\beta}+ \mathbf{P}_{\mathbf{W}}\mathbf{X}\boldsymbol{\beta}\\ & = \mathbf{W}\tilde{\boldsymbol{\alpha}} + \tilde{\mathbf{X}}\boldsymbol{\beta} \end{align*}\]
\[\begin{align*} \mathbf{Y}^T(\mathbf{P}_{\mathbf{Z}} - \mathbf{P}_{\mathbf{W}})\mathbf{Y}& = \|(\mathbf{P}_{\mathbf{Z}} - \mathbf{P}_{\mathbf{W}})\mathbf{Y}\|^2 \\ & = \|(\mathbf{P}_{\mathbf{Z}} - \mathbf{P}_{\mathbf{W}})(\mathbf{X}\boldsymbol{\beta}+ \mathbf{W}\alpha + \boldsymbol{\epsilon}\|^2 \\ & = \|(\mathbf{P}_{\mathbf{Z}} - \mathbf{P}_{\mathbf{W}})(\mathbf{X}\boldsymbol{\beta}\boldsymbol{\epsilon}\|^2 \\ & = \|(\mathbf{P}_{\mathbf{Z}} - \mathbf{P}_{\mathbf{W}})\boldsymbol{\epsilon}\|^2 \quad \text{ if } \boldsymbol{\beta}= \mathbf{0}\\ & = \boldsymbol{\epsilon}^T(\mathbf{P}_{\mathbf{Z}} - \mathbf{P}_{\mathbf{W}})\boldsymbol{\epsilon}\\ & \sim \sigma^2 \chi^2_p \quad \text{ if } \boldsymbol{\beta}= \mathbf{0} \end{align*}\]
Under M1: \(\boldsymbol{\beta}= \mathbf{0}\)
\[\begin{align*} F(\mathbf{Y}) & = \frac{(\textsf{SSE}_{M0} - \textsf{SSE}_{M1})/p}{\textsf{SSE}_{M1}/(n - q - p)} \\ & = \frac{(\textsf{SSE}_{M0} - \textsf{SSE}_{M1})/\sigma^2p}{\textsf{SSE}_{M1}/\sigma^2(n - q - p)} \\ & \mathrel{\mathop{=}\limits^{\rm D}}\frac{\chi^2_p/p}{\chi^2_{n-q-p}/(n-q-p)} \\ & \mathrel{\mathop{=}\limits^{\rm D}}F_{p, n-q-p} \end{align*}\]
Under M1, \(\mathbf{Y}^T(\mathbf{P}_{\mathbf{Z}} - \mathbf{P}_{\mathbf{W}})\mathbf{Y}/\sigma^2\) has a non-central \(\chi^2_{p, \eta}\) where the non-centrality parameter \(\eta = \boldsymbol{\beta}^T\mathbf{X}^T(\mathbf{I}- \mathbf{P}_{\mathbf{W}})\mathbf{X}\boldsymbol{\beta}/2\sigma^2\).
\(F\) has a non-central F distribution with \(p\) and \(n-q-p\) degrees of freedom and non-centrality parameter \(\eta = \boldsymbol{\beta}^T\mathbf{X}^T(\mathbf{I}- \mathbf{P}_{\mathbf{W}})\mathbf{X}\boldsymbol{\beta}/2\sigma^2\) (See Christensen Theorem 3.2.1 and Appendix C)
Consider the model with \(p = 1\), \(\mathbf{Y}= \mathbf{W}\boldsymbol{\alpha}+ \mathbf{x}\beta + \boldsymbol{\epsilon}\) and we want to test that \(\beta = 0\) (M0)
It turns out that we can obtain this \(F\) statistic by fitting the full model and the test reduces to a familiar \(t\)-test
\[\begin{align*} \hspace{-1in}{\text{Note: }} \hspace{1in} \textsf{SSE}_{M0} - \textsf{SSE}_{M1} & = \mathbf{Y}^T(\mathbf{P}_{\mathbf{Z}} - \mathbf{P}_{\mathbf{W}})\mathbf{Y}\\ & = \|(\mathbf{P}_{\tilde{\mathbf{X}}} + \mathbf{P}_{\mathbf{W}} - \mathbf{P}_{\mathbf{W}})\mathbf{Y}\|^2 \\ & = \|\mathbf{P}_{\tilde{\mathbf{X}}}\mathbf{Y}\|^2 \\ & = \|(\mathbf{I}- \mathbf{P}_{\mathbf{W}})\mathbf{X}\hat{\boldsymbol{\beta}}\|^2 \\ & = \hat{\boldsymbol{\beta}}^T\mathbf{X}^T(\mathbf{I}- \mathbf{P}_{\mathbf{W}})\mathbf{X}\hat{\boldsymbol{\beta}} \end{align*}\]
For \(p = 1\), the \(F\) statistic
\[\begin{align*} F(\mathbf{Y}) & = \frac{(\textsf{SSE}_{M0} - \textsf{SSE}_{M1})/1}{\textsf{SSE}_{M1}/(n - q - 1)} \\ & = \frac{\hat{\beta}^T\mathbf{x}^T(\mathbf{I}- \mathbf{P}_{\mathbf{W}})\mathbf{x}\hat{\beta}}{s^2} \\ & = \frac{\hat{\beta}^2}{s^2/\mathbf{x}^T(\mathbf{I}- \mathbf{P}_{\mathbf{W}})\mathbf{x}} \\ F(\mathbf{Y}) & \sim F_{1, n - q - 1} \quad \text{ under } \beta = 0 \end{align*}\]
\[\begin{align*} F(\mathbf{Y}) & = \frac{\hat{\beta}^2}{s^2/\mathbf{x}^T(\mathbf{I}- \mathbf{P}_{\mathbf{W}})\mathbf{x}} = \left(\frac{\hat{\beta}}{s \sqrt{v}}\right)^2 = t(\mathbf{Y})^2 \end{align*}\]
Since \(F(\mathbf{Y}) \sim F(1, n - q - 1)\) under M0: \(\beta = 0\), \(t(\mathbf{Y})^2 \sim F(1,n - q - 1)\) under M0: \(\beta = 0\)
what is distribution of \(t(\mathbf{Y})\) under M0: \(\beta \ne 0\)?
Recall that under M0: \(\beta = 0\),
where \[\begin{align*} Z & \sim \textsf{N}(0,1) \\ X & \sim \chi^2_\nu \\ Z &\text{ and } X \text{ are independent } \end{align*}\]
an \(F_{1, \nu}\) is equal in distribution to the square of Student \(t_{\nu}\) distribution under the null model (also equal in distribution under the full model, but have a non-centrality parameter)
Decision rule was to reject M0 if \(F(\mathbf{Y}) > F_{1, n - q - 1, \alpha}\)
\(p\)-value is \(\Pr(F_{1, n - q - 1} > F(\mathbf{Y})\); the probability of observing a value of \(F\) as extreme as the observed value under the null model
using a t-distribution, the equivalent decision rule is to reject M0 if \(|t(\mathbf{Y})| > t_{n - q - 1, \alpha/2}\)
\(p\)-value is \(\Pr(|T_{n - q - 1}| > |t(\mathbf{Y})|)\)
equal-tailed \(t\)-test
we derived the \(F\)-test heurestically, but the formally this test may be derived as a likelihood ratio test.
consider a statistical model \(\mathbf{Y}\sim P, P \in \{P_\boldsymbol{\theta}: \boldsymbol{\theta}\in \boldsymbol{\Theta}\}\)
\(P\) is the true unknown distribution for \(\mathbf{Y}\)
\(\{P_\boldsymbol{\theta}: \boldsymbol{\theta}\in \boldsymbol{\Theta}\}\) is the model, the set of possible distributions for \(\mathbf{Y}\) with \(\boldsymbol{\Theta}\) the parameter space
we might hypothesize that \(\boldsymbol{\theta}\subset \boldsymbol{\Theta}_0 \subset \boldsymbol{\Theta}\)
for our linear model this translates as \(\boldsymbol{\theta}= (\boldsymbol{\alpha}, \boldsymbol{\beta}, \sigma^2) \subset \mathbb{R}^q \times \{\mathbf{0}\} \times \mathbb{R}^+ \subset \mathbb{R}^g \times \mathbb{R}^p \times \mathbb{R}^+\)
compute the likelihood ratio statistic \[R(\mathbf{Y}) = \frac{\sup_{\boldsymbol{\theta}\in \boldsymbol{\Theta}_0} p_\boldsymbol{\theta}(\mathbf{Y}))}{\sup_{\boldsymbol{\theta}\in \boldsymbol{\Theta}} p_\boldsymbol{\theta}(\mathbf{Y}))}\]
Equivalently, we can look at -2 times the log likelihood ratio statistic \[\lambda(\mathbf{Y}) = -2\log(R(\mathbf{Y})) = -2 [\sup_{\boldsymbol{\theta}\in \boldsymbol{\Theta}_0} \cal{l}(\boldsymbol{\theta})- \sup_{\boldsymbol{\theta}\in \boldsymbol{\Theta}} \cal{l}(\boldsymbol{\theta})]\] where \(\cal{l}(\boldsymbol{\theta}) \propto \log p_\boldsymbol{\theta}(\mathbf{Y})\) (the log likelihood)
Steps:
with some rearranging and 1-to-1 transformations, can show that this is equivalent to the \(F\)-test! (HW)