Duke University
Please let me know if there are broken links for slides, etc!
Ohm’s Law: \(Y\) is voltage across a resistor of \(r\) ohms and \(X\) is the amperes of the current through the resistor (in theory) \[Y = rX\]
Simple linear regression for observational data \[Y_i = \beta_0 + \beta_1 x_i + \epsilon_i \text{ for } i = 1, \ldots, n\]
Rewrite in vectors: \[\begin{eqnarray*} \left[ \begin{array}{c} y_1 \\ \vdots \\ y_n \end{array} \right] = & \left[ \begin{array}{c} 1 \\ \vdots \\ 1 \end{array} \right] \beta_0 + \left[ \begin{array}{c} x_1 \\ \vdots \\ x_n \end{array} \right] \beta_1 + \left[ \begin{array}{c} \epsilon_1 \\ \vdots \\ \epsilon_n \end{array} \right] = & \left[ \begin{array}{cc} 1 & x_1 \\ \vdots & \vdots \\ 1 & x_n\end{array} \right] \left[ \begin{array}{c} \beta_0 \\ \beta_1 \end{array} \right] + \left[ \begin{array}{c} \epsilon_1 \\ \vdots \\ \epsilon_n \end{array} \right] \\ \\ \mathbf{Y}= & \mathbf{X}\boldsymbol{\beta}+ \boldsymbol{\epsilon} \end{eqnarray*}\]
Gravitational Law: \(F = \alpha/d^\beta\) where \(d\) is distance between 2 objects and \(F\) is the force of gravity between them
log transformations \[\log(F) = \log(\alpha) - \beta \log(d)\]
compare to noisy experimental data \(Y_i =\log(F_i)\) observed at \(x_i = \log(d_i)\)
write \(\mathbf{X}= [\mathbf{1}_n \, \mathbf{x}]\)
\(\boldsymbol{\beta}= (\log(\alpha), -\beta)^T\)
model with additive error on log scale \(\mathbf{Y}= \mathbf{X}\boldsymbol{\beta}+ \mathbf{e}\)
test if \(\beta = 2\)
error assumptions?
Regression function may be an intrinsically nonlinear function of \(t_i\) (time) and parameters \(\boldsymbol{\theta}\) \[Y_i = f(t_i, \boldsymbol{\theta}) + \epsilon_i\]
Taylor’s Theorem: \[f(t_i, \boldsymbol{\theta}) = f(t_0, \boldsymbol{\theta}) + (t_i - t_0) f'(t_0, \boldsymbol{\theta}) + (t_i - t_0)^2 \frac{f^{''}(t_0, \boldsymbol{\theta})}{2} + R(t_i, \boldsymbol{\theta})\]
\[Y_i = \beta_0 + \beta_1 x_i + \beta_2 x_i^2 + \epsilon_i \text{ for } i = 1, \ldots, n\]
Rewrite in vectors: \[\begin{eqnarray*} \left[ \begin{array}{c} y_1 \\ \vdots \\ y_n \end{array} \right] = & \left[ \begin{array}{ccc} 1 & x_1 & x_1^2 \\ \vdots & \vdots \\ 1 & x_n & x_n^2\end{array} \right] \left[ \begin{array}{c} \beta_0 \\ \beta_1 \\ \beta_2 \end{array} \right] + \left[ \begin{array}{c} \epsilon_1 \\ \vdots \\ \epsilon_n \end{array} \right] \\ & \\ \mathbf{Y}= & \mathbf{X}\boldsymbol{\beta}+ \boldsymbol{\epsilon} \end{eqnarray*}\]
Quadratic in \(x\), but linear in \(\beta\)’s - how do we know this model is adequate?
\[y_i = \beta_0 + \sum_{j = 1}^J \beta_j e^{-\lambda (x_i - k_j)^d} + \epsilon_i \text{ for } i = 1, \ldots, n\] where \(k_j\) are kernel locations and \(\lambda\) is a smoothing parameter \[\begin{eqnarray*} \left[ \begin{array}{c} y_1 \\ \vdots \\ y_n \end{array} \right] = & \left[ \begin{array}{cccc} 1 & e^{-\lambda (x_1 - k_1)^d} & \ldots & e^{-\lambda (x_1 - k_J)^d} \\ \vdots & \vdots & & \vdots \\ 1 & e^{-\lambda (x_n - k_1)^d} & \ldots & e^{-\lambda (x_n - k_J)^d} \end{array} \right] \left[ \begin{array}{c} \beta_0 \\ \beta_1 \\\vdots \\ \beta_J \end{array} \right] + \left[ \begin{array}{c} \epsilon_1 \\ \vdots \\ \epsilon_n \end{array} \right] \\ & \\ \mathbf{Y}= & \mathbf{X}\boldsymbol{\beta}+ \boldsymbol{\epsilon} \end{eqnarray*}\]
Linear in \(\beta\) given \(\lambda\) and \(k_1, \ldots k_J\)
Learn \(\lambda\), \(k_1, \ldots k_J\) and \(J\)
Response \(Y_i\) and \(p\) predictors \(x_{i1}, x_{i2}, \dots x_ip\) \[Y_i = \beta_0 + \beta_1 x_{i1} + \beta_2 x_{i2} + \ldots \beta_{p} x_{ip} + \epsilon_i\]
Design matrix \[\mathbf{X}= \left[\begin{array}{cccc} 1 & x_{11} & \ldots & x_{1p} \\ 1 & x_{21} & \ldots & x_{2p} \\ \vdots & \vdots & \vdots & \vdots \\ 1 & x_{n1} & \ldots & x_{np} \\ \end{array} \right] = \left[ \begin{array}{cc} 1 & \mathbf{x}_1^T \\ \vdots & \vdots \\ 1 & \mathbf{x}_n^T \end{array} \right] = \left[\begin{array}{cccc} \mathbf{1}_n & \mathbf{X}_1 & \mathbf{X}_2 \cdots \mathbf{X}_p \end{array} \right] \]
matrix version \[\mathbf{Y}= \mathbf{X}\boldsymbol{\beta}+ \epsilon\] what should go into \(\mathbf{X}\) and do we need all columns of \(\mathbf{X}\) for inference about \(\mathbf{Y}\)?
Goals:
All models are wrong, but some may be useful (George Box)
Goal: Find the best fitting “line” or “hyper-plane” that minimizes \[\sum_i (Y_i - \mathbf{x}_i^T \boldsymbol{\beta})^2 = (\mathbf{Y}- \mathbf{X}\boldsymbol{\beta})^T(\mathbf{Y}- \mathbf{X}\boldsymbol{\beta}) = \| \mathbf{Y}- \mathbf{X}\boldsymbol{\beta}\|^2 \]
Need Distribution Assumptions of \(\mathbf{Y}\) (or \(\boldsymbol{\epsilon}\)) for testing and uncertainty measures \(\Rightarrow\) Likelihood and Bayesian inference
Let \(Y_1, \ldots Y_n\) be random variables in \(\mathbb{R}\) Then \[\mathbf{Y}\equiv \left[ \begin{array}{c} Y_1 \\ \vdots \\ Y_n \end{array} \right]\] is a random vector in \(\mathbb{R}^n\)
Expectations of random vectors are defined element-wise: \[\textsf{E}[\mathbf{Y}] \equiv \textsf{E}\left[ \begin{array}{c} Y_1 \\ \vdots \\ Y_n \end{array} \right] \equiv \left[ \begin{array}{c} \textsf{E}[Y_1] \\ \vdots \\ \textsf{E}[Y_n] \end{array} \right] = \left[ \begin{array}{c} \mu_1 \\ \vdots \\ \mu_n \end{array} \right] \equiv \boldsymbol{\mu}\in \mathbb{R}^n \] where mean or expected value \(\textsf{E}[Y_i] = \mu_i\)
We will work with inner product spaces: a vector spaces, say \(\mathbb{R}^n\) equipped with an inner product \(\langle \mathbf{x},\mathbf{y}\rangle \equiv \mathbf{x}^T\mathbf{y}, \quad \mathbf{x}, \mathbf{y}\in \mathbb{R}^n\)
That is, if \(\mathbf{x}_1 \in \boldsymbol{{\cal M}}\) and \(\mathbf{x}_2 \in \boldsymbol{{\cal M}}\), then \(b_1\mathbf{x}_1 + b_2 \mathbf{x}_2 \in \boldsymbol{{\cal M}}\) for all \(b_1, b_2 \in \mathbb{R}\)
If \(\mathbf{X}\) is full column rank, then the columns of \(\mathbf{X}\) form a basis for \(C(\mathbf{X})\) and \(C(\mathbf{X})\) is a p-dimensional subspace of \(\mathbb{R}^n\)
If we have just a single model matrix \(\mathbf{X}\), then the subspace \(\boldsymbol{{\cal M}}\) is the model space.
Important to understand advantages and problems of each perspective!