Generalized Additive Models

David L Miller

Overview

What is a GAM?
What is smoothing?
How do GAMs work? (Roughly)
Fitting and plotting simple models

What is a GAM?

Generalized Additive Models

Generalized: many response distributions
Additive: terms add together
Models: well, it's a model…

To GAMs from GLMs and LMs

(Generalized) Linear Models

Models that look like:

\[ y_i = \beta_0 + x_{1i}\beta_1 + x_{2i}\beta_2 + \ldots + \epsilon_i \]

(describe the response, $ y_i $, as linear combination of the covariates, $ x_{ji} $, with an offset)

We can make $ y_i\sim $ any exponential family distribution (Normal, Poisson, etc).

Error term $ \epsilon_i $ is normally distributed (usually).

Why bother with anything more complicated?!

Is this linear?

plot of chunk islinear

Is this linear? Maybe?

lm(y ~ x1, data=dat)

plot of chunk maybe

What can we do?

Adding a quadratic term?

lm(y ~ x1 + poly(x1, 2), data=dat)

plot of chunk quadratic

Is this sustainable?

Adding in quadratic (and higher terms) can make sense
This feels a bit ad hoc
Better if we had a framework to deal with these issues?

plot of chunk ruhroh

[drumroll]

What does a model look like?

\[ y_i = \beta_0 + \sum_j s_j(x_{ji}) + \epsilon_i \]

where $ \epsilon_i \sim N(0, \sigma^2) $, $ y_i \sim \text{Normal} $ (for now)

Remember that we're modelling the mean of this distribution!

Call the above equation the linear predictor

Okay, but what about these "s" things?

plot of chunk smoothdat

Think $ s $=smooth
Want to model the covariates flexibly
Covariates and response not necessarily linearly related!
Want some “wiggles”

Okay, but what about these "s" things?

plot of chunk wsmooths

Think $ s $=smooth
Want to model the covariates flexibly
Covariates and response not necessarily linearly related!
Want some “wiggles”

What is smoothing?

Straight lines vs. interpolation

plot of chunk wiggles

Want a line that is “close” to all the data
Don't want interpolation – we know there is “error”
Balance between interpolation and “fit”

Splines

Functions made of other, simpler functions
Basis functions $ b_k $, estimate $ \beta_k $
$ s(x) = \sum_{k=1}^K \beta_k b_k(x) $
Makes the math(s) much easier

Design matrices

We often write models as $ X\boldsymbol{\beta} $
- $ X $ is our data
- $ \boldsymbol{\beta} $ are parameters we need to estimate
For a GAM it's the same
- $ X $ has columns for each basis, evaluated at each observation
- again, this is the linear predictor

Measuring wigglyness

Visually:
- Lots of wiggles == NOT SMOOTH
- Straight line == VERY SMOOTH
How do we do this mathematically?
- Derivatives!
- (Calculus was a useful class afterall!)

Wigglyness by derivatives

What was that grey bit?

\[ \int_\mathbb{R} \left( \frac{\partial^2 f(x)}{\partial^2 x}\right)^2 \text{d}x\\ \]

(Take some derivatives of the smooth and integrate them over $ x $)

(Turns out we can always write this as $ \boldsymbol{\beta}^\text{T}S\boldsymbol{\beta} $, so the $ \boldsymbol{\beta} $ is separate from the derivatives)

(Call $ S $ the penalty matrix)

Making wigglyness matter

$ \boldsymbol{\beta}^\text{T}S\boldsymbol{\beta} $ measures wigglyness
“Likelihood” measures closeness to the data
Penalise closeness to the data…
Use a smoothing parameter to decide on that trade-off…
- $ \lambda \beta^\text{T}S\beta $
Estimate the $ \beta_k $ terms but penalise objective
- “closeness to data” + penalty

Smoothing parameter

plot of chunk wiggles-plot

Smoothing parameter selection

Many methods: AIC, Mallow's $ C_p $, GCV, ML, REML
Recommendation, based on simulation and practice:
- Use REML or ML
- Reiss & Ogden (2009), Wood (2011)

Maximum wiggliness

We can set basis complexity or “size” ($ k $)
- Maximum wigglyness
Smooths have effective degrees of freedom (EDF)
EDF < $ k $
Set $ k $ “large enough”
- Penalty does the rest

GAM summary

Straight lines suck — we want wiggles
Use little functions (basis functions) to make big functions (smooths)
Need to make sure your smooths are wiggly enough
Use a penalty to trade off wiggliness/generality

Fitting GAMs in practice

Translating maths into R

A simple example:

\[ y_i = \beta_0 + s(x) + s(w) + \epsilon_i \]

where $ \epsilon_i \sim N(0, \sigma^2) $

Let's pretend that $ y_i \sim \text{Normal} $

linear predictor: formula = y ~ s(x) + s(w)
response distribution: family=gaussian()
data: data=some_data_frame

Putting that together

my_model <- gam(y ~ s(x) + s(w),
                family = gaussian(),
                data = some_data_frame,
                method = "REML")

method="REML" uses REML for smoothness selection (default is "GCV.Cp")

What about a practical example?

Pantropical spotted dolphins

Example taken from Miller et al (2013)
Paper appendix has a better analysis
Simple example here, ignoring all kinds of important stuff!

a pantropical spotted dolphin doing its thing

Inferential aims

How many dolphins are there?
Where are the dolphins?
What are they interested in?

A simple dolphin model

library(mgcv)
dolphins_depth <- gam(count ~ s(depth) + offset(off.set),
                      data = mexdolphins,
                      family = quasipoisson(),
                      method = "REML")

count is a function of depth
off.set is the effort expended
we have count data, try quasi-Poisson distribution

What did that do?

summary(dolphins_depth)


Family: quasipoisson 
Link function: log 

Formula:
count ~ s(depth) + offset(off.set)

Parametric coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) -18.2344     0.8949  -20.38   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Approximate significance of smooth terms:
           edf Ref.df     F p-value  
s(depth) 6.592  7.534 2.329  0.0224 *
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

R-sq.(adj) =  0.0545   Deviance explained = 26.4%
-REML = 948.28  Scale est. = 145.34    n = 387

Plotting

plot of chunk plotsmooth

plot(dolphins_depth)
Dashed lines indicate +/- 2 standard errors
Rug plot
On the link scale
EDF on $ y $ axis

Thin plate regression splines

Default basis
One basis function per data point
Reduce # basis functions (eigendecomposition)
Fitting on reduced problem
Multidimensional
Wood (2003)

Bivariate terms

Assumed an additive structure
No interaction
We can specify s(x,y) (and s(x,y,z,...))
(Assuming isotropy here…)

plot of chunk xydsmplot

Adding a term

Add a surface for location ($ x $ and $ y $)
Just use + for an extra term

dolphins_depth_xy <- gam(count ~ s(depth) + s(x, y) + offset(off.set),
                 data = mexdolphins,
                 family=quasipoisson(), method="REML")

Summary

summary(dolphins_depth_xy)


Family: quasipoisson 
Link function: log 

Formula:
count ~ s(depth) + s(x, y) + offset(off.set)

Parametric coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) -19.1933     0.9468  -20.27   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Approximate significance of smooth terms:
            edf Ref.df     F p-value
s(depth)  6.804  7.669 1.461   0.191
s(x,y)   23.639 26.544 1.358   0.114

R-sq.(adj) =   0.22   Deviance explained = 49.9%
-REML =  923.9  Scale est. = 79.474    n = 387

Plotting

plot(dolphins_depth_xy, scale=0, pages=1)

plot of chunk plotsmooth-xy

scale=0: each plot on different scale
pages=1: plot together

Plotting 2d terms... erm...

plot(dolphins_depth_xy, select=2, cex=2, asp=1, lwd=2)

plot of chunk plotsmooth-xy-biv

select= picks which smooth to plot

Let's try something different

plot(dolphins_depth_xy, select=2, cex=2, asp=1, lwd=2, scheme=2)

plot of chunk plot-scheme2

scheme=2 much better for bivariate terms
vis.gam() is much more general

More complex plots

par(mfrow=c(1,2))
vis.gam(dolphins_depth_xy, view=c("depth","x"), too.far=0.1, phi=30, theta=45)
vis.gam(dolphins_depth_xy, view=c("depth","x"), plot.type="contour", too.far=0.1,asp=1/1000)

plot of chunk visgam

Fitting/plotting GAMs summary

gam does all the work
very similar to glm
s indicates a smooth term
plot can give simple plots
vis.gam for more advanced stuff

Prediction

What is a prediction?

Evaluate the model, at a particular covariate combination
Answering (e.g.) the question “at a given depth, how many dolphins?”
Steps:
1. evaluate the $ s(\ldots) $ terms
2. move to the response scale (exponentiate? Do nothing?)
3. (multiply any offset etc)

Example of prediction

in maths:
- Model: $ \text{count}_i = A_i \exp \left( \beta_0 + s(x_i, y_i) + s(\text{Depth}_i)\right) $
- Drop in the values of $ x, y, \text{Depth} $ (and $ A $)
in R:
- build a data.frame with $ x, y, \text{Depth}, A $
- use predict()

preds <- predict(my_model, newdat=my_data, type="response")

(se.fit=TRUE gives a standard error for each prediction)

Back to the dolphins...

Where are the dolphins?

dolphin_preds <- predict(dolphins_depth_xy, newdata=preddata,
                         type="response")

plot of chunk unnamed-chunk-8

(ggplot2 code included in the slide source)

Prediction summary

Evaluate the fitted model at a given point
Can evaluate many at once (data.frame)
Don't forget the type=... argument!
Obtain per-prediction standard error with se.fit

What about uncertainty?

Without uncertainty, we're not doing statistics

Where does uncertainty come from?

$ \boldsymbol{\beta} $: uncertainty in the spline parameters
$ \boldsymbol{\lambda} $: uncertainty in the smoothing parameter
(Traditionally we've only addressed the former)
(New tools let us address the latter…)

Parameter uncertainty

From theory:

\[ \boldsymbol{\beta} \sim N(\hat{\boldsymbol{\beta}}, \mathbf{V}_\boldsymbol{\beta}) \]

(caveat: the normality is only approximate for non-normal response)

What does this mean? Variance for each parameter.

In mgcv: vcov(model) returns $ \mathbf{V}_\boldsymbol{\beta} $.

What can we do this this?

confidence intervals in plot
standard errors using se.fit
derived quantities? (see bibliography)

The lpmatrix, magic, etc

For regular predictions:

\[ \hat{\boldsymbol{\eta}}_p = L_p \hat{\boldsymbol{\beta}} \]

form $ L_p $ using the prediction data, evaluating basis functions as we go.

(Need to apply the link function to $ \hat{\boldsymbol{\eta}}_p $)

But the $ L_p $ fun doesn't stop there…

[[mathematics intensifies]]

Variance and lpmatrix

To get variance on the scale of the linear predictor:

\[ V_{\hat{\boldsymbol{\eta}}} = L_p^\text{T} V_\hat{\boldsymbol{\beta}} L_p \]

pre-/post-multiplication shifts the variance matrix from parameter space to linear predictor-space.

(Can then pre-/post-multiply by derivatives of the link to put variance on response scale)

Simulating parameters

$ \boldsymbol{\beta} $ has a distribution, we can simulate

Animation of uncertainty

Uncertainty in smoothing parameter

Recent work by Simon Wood
“smoothing parameter uncertainty corrected” version of $ V_\hat{\boldsymbol{\beta}} $
In a fitted model, we have:
- $Vp what we got with vcov
- $Vc the corrected version
Still experimental

Variance summary

Everything comes from variance of parameters
Need to re-project/scale them to get the quantities we need
mgcv does most of the hard work for us
Fancy stuff possible with a little maths
Can include uncertainty in the smoothing parameter too

Okay, that was a lot of information

Summary

GAMs are GLMs plus some extra wiggles
Need to make sure things are just wiggly enough
- Basis + penalty is the way to do this
Fitting looks like glm with extra s() terms
Most stuff comes down to matrix algebra, that mgcv sheilds you from
- To do fancy stuff, get inside the matrices

Generalized Additive Models

Overview

What is a GAM?

Generalized Additive Models

To GAMs from GLMs and LMs

(Generalized) Linear Models

Why bother with anything more complicated?!

Is this linear?

Is this linear? Maybe?

What can we do?

Adding a quadratic term?

Is this sustainable?

[drumroll]

What does a model look like?

Okay, but what about these "s" things?

Okay, but what about these "s" things?

What is smoothing?

Straight lines vs. interpolation

Splines

Design matrices

Measuring wigglyness

Wigglyness by derivatives

What was that grey bit?

Making wigglyness matter

Smoothing parameter

Smoothing parameter selection

Maximum wiggliness

GAM summary

Fitting GAMs in practice

Translating maths into R

Putting that together

What about a practical example?

Pantropical spotted dolphins

Inferential aims

A simple dolphin model

What did that do?

Plotting

Thin plate regression splines

Bivariate terms

Adding a term

Summary

Plotting

Plotting 2d terms... erm...

Let's try something different

More complex plots

Fitting/plotting GAMs summary

Prediction

What is a prediction?

Example of prediction

Back to the dolphins...

Where are the dolphins?

Prediction summary

What about uncertainty?

Without uncertainty, we're not doing statistics

Where does uncertainty come from?

Parameter uncertainty

What can we do this this?

The lpmatrix, magic, etc

[[mathematics intensifies]]

Variance and lpmatrix

Simulating parameters

Uncertainty in smoothing parameter

Variance summary

Okay, that was a lot of information

Summary

COFFEE