Transformations and Normality Assumptions

STA 721: Lecture 22

Merlise Clyde

Duke University

Outline

  • Normality & Transformations
  • Box-Cox
  • Variance Stabilizing Transformations
  • Nonlinear Regression

Readings: Christensen Chapter 13, Seber & Lee Chapter 10 & Wakefield Chapter 6

Model Assumptions

Linear Model: \[ \mathbf{Y}= \boldsymbol{\mu}+ \boldsymbol{\epsilon}\]
Assumptions:
\[\begin{eqnarray*} \boldsymbol{\mu}\in C(\mathbf{X}) & \Leftrightarrow & \boldsymbol{\mu}= \mathbf{X}\boldsymbol{\beta}\\ \boldsymbol{\epsilon}& \sim & \textsf{N}(\mathbf{0}_n, \sigma^2 \mathbf{I}_n) \end{eqnarray*}\]

  • Normal Distribution for \(\mathbf{Y}\) with constant variance or fixed covariance
  • linearity of \(\boldsymbol{\mu}\) in \(\mathbf{X}\)
  • Computational Advantages of Normal Models
  • Robustify with heavy tailed error distributions

Checking via QQ-Plots

  • QQ-Plots are a graphical tool to assess normality

  • Order residuals \(e_i\): \(e_{(1)} \le e_{(2)} \ldots \le e_{(n)}\) sample order statistics or sample quantiles (standardized - divide by \(\sqrt{1 - h_{ii}}\)

  • Let \(z_{(1)} \le z_{(2)} \ldots z_{(n)}\) denote the expected order statistics of a sample of size \(n\) from a standard normal distribution ``theoretical quantiles’’

  • If the \(e_i\) are normal then \(\textsf{E}[e_{(i)}/\sqrt{1 - h_{ii}}] = \sigma z_{(i)}\)

  • Expect that points in a scatter plot of \(e_{(i)}/\sqrt{1 - h_{ii}}\) and \(z_{(i)}\) should be on a straight line.

  • Judgment call - use simulations to gain experience!

Animal Example

data(Animals, package="MASS")
plot(brain ~ body, data=Animals, xlab="Body Weight (kg)", ylab="Brain Weight (g)", main="Original Units")

Residual Plots

brains.lm <- lm(brain ~ body, data=Animals)
par(mfrow=c(2,2))
plot(brains.lm)

Normality Assumption

Recall:

\[\begin{eqnarray*} \mathbf{e}& = &(\mathbf{I}- \mathbf{P}_\mathbf{X}) \mathbf{Y}\\ & = & (\mathbf{I}- \mathbf{P}_\mathbf{X})(\mathbf{X}\hat{\boldsymbol{\beta}}+ \boldsymbol{\epsilon}) \\ & = & (\mathbf{I}- \mathbf{P}_\mathbf{X})\boldsymbol{\epsilon} \end{eqnarray*}\]

\[e_i = \epsilon_i - \sum_{j=1}^n h_{ij} \epsilon_j\]

  • Lyapunov CLT (independent but not identically distributed) implies that residuals will be approximately normal (even for modest \(n\)), if the errors are not normal

  • Supernormality of residuals

  • clearly not the case here!

Box-Cox Transformation

Box and Cox (1964) suggested a family of power transformations for \(Y > 0\)
\[ U(\mathbf{Y}, \lambda) = Y^{(\lambda)} = \left\{ \begin{array}{ll} \frac{(Y^\lambda -1)}{\lambda} & \lambda \neq 0 \\ \log(Y) & \lambda = 0 \end{array} \right. \]

  • Estimate \(\lambda\) by maximum Likelihood
    \[{\cal{L}}(\lambda, \boldsymbol{\beta}, \sigma^2) \propto \prod f(y_i \mid \lambda, \boldsymbol{\beta}, \sigma^2)\]

  • \(U(\mathbf{Y}, \lambda) = Y^{(\lambda)} \sim \textsf{N}(\mathbf{X}\boldsymbol{\beta}, \sigma^2)\)

  • Jacobian term is \(\prod_i y_i^{\lambda - 1}\) for all \(\lambda\)

  • Profile Likelihood based on substituting MLE \(\boldsymbol{\beta}\) and \(\sigma^2\) for each value of \(\lambda\) is \[\log({\cal{L}}(\lambda) \propto (\lambda -1) \sum_i \log(Y_i) - \frac{n}{2} \log(\textsf{SSE}(\lambda))\]

Profile Likelihood

  • Profile Likelihood is a function of \(\lambda\) obtained by substituting the MLE of \(\boldsymbol{\beta}\) and \(\sigma^2\) for each value of \(\lambda\)
MASS::boxcox(brains.lm, lambda=seq(-2, 2, by=0.1))

Residuals after Transformation of Response

plot(lm(log(brain) ~ body, data=Animals))

Residuals after Transformation of Response and Predictor

logbrain.lm <- lm(log(brain) ~ log(body), data=Animals)
plot(logbrain.lm)

Transformed Data

plot(brain ~ body, data=Animals, xlab="Body Weight (kg)", ylab="Brain Weight (g)", log="xy", main="Logarithmic Scale")

Test that Dinosaurs are from a Different Population (outliers)

logbrains.nodino.lm = lm(log(brain) ~ log(body) + 
                           I(row.names(Animals) == "Triceratops") +  
                           I(row.names(Animals) == "Brachiosaurus") + 
                           I(row.names(Animals) == "Dipliodocus"), 
                         data=Animals)

anova(logbrain.lm, logbrains.nodino.lm)
Analysis of Variance Table

Model 1: log(brain) ~ log(body)
Model 2: log(brain) ~ log(body) + I(row.names(Animals) == "Triceratops") + 
    I(row.names(Animals) == "Brachiosaurus") + I(row.names(Animals) == 
    "Dipliodocus")
  Res.Df    RSS Df Sum of Sq     F    Pr(>F)
1     26 60.988                             
2     23 12.117  3    48.871 30.92 3.031e-08
Animals = cbind(Animals, diag(28)); colnames(Animals)[3:30] = rownames(Animals)
brains.bas = bas.lm(log(brain) ~ log(body) + . - body, data=Animals, 
                    prior="hyper-g-n", a=3,modelprior=beta.binomial(1,28), method="MCMC", n.models=10000, MCMC.it=2^17)
image(brains.bas, rotate=FALSE)

Variance Stabilizing Transformations

  • If \(Y - \mu\) (approximately) \(N(0, h(\mu))\)

  • Delta Method implies that \[g(Y) \stackrel{\cdot}{\sim}\textsf{N}( g(\mu), g'(\mu)^2 h(\mu))\]

  • Find function \(g\) such that \(g'(\mu)^2 h(\mu)\) is constant \[g(Y) \sim N(g(\mu), c)\]

  • Poisson Counts (need \(Y > 3\)), \(g\) is the square root transformation

  • Binomial: \(\arcsin(\sqrt{Y})\)

Note: transformation for normality may not be the same as the variance stabilizing transformation; boxcox assumes mean function is correct

Generalized Linear Models are preferable to transforming data, but may still be useful for starting values for MCMC

Nonlinear Regression

  • Drug concentration of caldralazine at time \(X_i\) in a cardiac failure patient given a single 30mg dose \((D = 30)\) given by \[ \mu(\boldsymbol{\beta}) = \left[\frac{D}{V} \exp(-\kappa_e x_i) \right] \] with \(\boldsymbol{\beta}= (V, \kappa_e)\) \(V = volume\) and \(\kappa_e\) is the elimination rate

  • If \(Y_i = \left[\frac{D}{V} \exp(-\kappa_e x_i) \right] \times \epsilon_i\) with \(\log(\epsilon_i) \mathrel{\mathop{\sim}\limits^{\rm iid}} N(0, \sigma^2)\) then the model is intrinisically linear (can transform to linear model)

\[\begin{eqnarray*} \log(\mu(\boldsymbol{\beta})) & = & \log\left[\frac{D}{V} \exp(-\kappa_e x_i) \right] = \log[D] - \log(V) -\kappa_e x_i \\ log(Y_i) - \log[30] & = &\beta_0 + \beta_1 x_i + \epsilon_i \end{eqnarray*}\] where \(\epsilon_i\) has a log normal distribution

Nonlinear Least Squares Example

x = c(2,4,6,8,10,24,28, 32)
y = c(1.63, 1.01, .73, .55, .41, .01, .06, .02)

conc.lm = lm(I(log(y) - log(30)) ~ x)

vhat = exp(-coef(conc.lm)[1])
khat = -coef(conc.lm)[2]

vhat
(Intercept) 
   16.66331 
khat
        x 
0.1521064 
plot(x, y)
lines(x, (30/vhat)*exp(-khat*x))

Additive Error Model

  • If \(\mathbf{Y}= \left[\frac{D}{V} \exp(-\kappa_e x_i) \right] + \epsilon_i\) model is intrinisically nonlinear and cannot transform to a linear model.

  • need to use nonlinear least squares to estimate \(\boldsymbol{\beta}\) and \(\sigma^2\)

  • or MCMC to estimate the posterior distribution of \(\boldsymbol{\beta}\) and \(\sigma^2\)

Intrinsically Linear Model nls

df = data.frame(y=y, x=x)
logconc.nlm = nls( log(y) ~ log((30/V)*exp(-k*x)), data=df, start=list(V=vhat, k=khat))
summary(logconc.nlm)

Formula: log(y) ~ log((30/V) * exp(-k * x))

Parameters:
              Estimate Std. Error t value Pr(>|t|)
V.(Intercept) 16.66331    7.11923   2.341 0.057796
k.x            0.15211    0.02368   6.423 0.000673

Residual standard error: 0.7411 on 6 degrees of freedom

Number of iterations to convergence: 0 
Achieved convergence tolerance: 4.056e-09

Intrinsically Nonlinear Model


Formula: y ~ (30/V) * exp(-k * x)

Parameters:
  Estimate Std. Error t value Pr(>|t|)
V 13.06506    0.60899   21.45 6.69e-07
k  0.18572    0.01124   16.52 3.14e-06

Residual standard error: 0.05126 on 6 degrees of freedom

Number of iterations to convergence: 4 
Achieved convergence tolerance: 7.698e-06

Functions of Interest

Interest is in

  • clearance: \(V \kappa_e\)

  • elimination half-life \(x_{1/2} = \log 2/\kappa_e\)

  • Use properties of MLEs: asymptotically \(\hat{\boldsymbol{\beta}} \sim N\left(\boldsymbol{\beta}, I(\hat{\boldsymbol{\beta}})^{-1}\right)\)

  • Asymptotic Distributions

  • Bayes obtain the posterior directly for parameters and functions of parameters!

    • Priors?
    • Constraints on Distributions?
    • Bayes Factor for testing normal vs log-normal models?

Summary

  • Optimal transformation for normality (MLE) depends on choice of mean function
  • May not be the same as the variance stabilizing transformation
  • Nonlinear Models as suggested by Theory or Generalized Linear Models are alternatives
  • ``normal’’ estimates may be useful approximations for large \(p\) or for starting values for more complex models (where convergence may be sensitive to starting values)