David L Miller
Models that look like:
\[ y_i = \beta_0 + x_{1i}\beta_1 + x_{2i}\beta_2 + \ldots + \epsilon_i \]
(describe the response, \( y_i \), as linear combination of the covariates, \( x_{ji} \), with an offset)
We can make \( y_i\sim \) any exponential family distribution (Normal, Poisson, etc).
Error term \( \epsilon_i \) is normally distributed (usually).
lm(y ~ x1, data=dat)
lm(y ~ x1 + poly(x1, 2), data=dat)
\[ y_i = \beta_0 + \sum_j s_j(x_{ji}) + \epsilon_i \]
where \( \epsilon_i \sim N(0, \sigma^2) \), \( y_i \sim \text{Normal} \) (for now)
Remember that we're modelling the mean of this distribution!
Call the above equation the linear predictor
\[ \int_\mathbb{R} \left( \frac{\partial^2 f(x)}{\partial^2 x}\right)^2 \text{d}x\\ \]
(Take some derivatives of the smooth and integrate them over \( x \))
(Turns out we can always write this as \( \boldsymbol{\beta}^\text{T}S\boldsymbol{\beta} \), so the \( \boldsymbol{\beta} \) is separate from the derivatives)
(Call \( S \) the penalty matrix)
More on this in a bit…
A simple example:
\[ y_i = \beta_0 + s(x) + s(w) + \epsilon_i \]
where \( \epsilon_i \sim N(0, \sigma^2) \)
Let's pretend that \( y_i \sim \text{Normal} \)
formula = y ~ s(x) + s(w)
family=gaussian()
data=some_data_frame
my_model <- gam(y ~ s(x) + s(w),
family = gaussian(),
data = some_data_frame,
method = "REML")
method="REML"
uses REML for smoothness selection (default is "GCV.Cp"
)library(mgcv)
dolphins_depth <- gam(count ~ s(depth) + offset(off.set),
data = mexdolphins,
family = quasipoisson(),
method = "REML")
off.set
is the effort expendedsummary(dolphins_depth)
Family: quasipoisson
Link function: log
Formula:
count ~ s(depth) + offset(off.set)
Parametric coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -18.2344 0.8949 -20.38 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Approximate significance of smooth terms:
edf Ref.df F p-value
s(depth) 6.592 7.534 2.329 0.0224 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
R-sq.(adj) = 0.0545 Deviance explained = 26.4%
-REML = 948.28 Scale est. = 145.34 n = 387
plot(dolphins_depth)
s(x,y)
(and s(x,y,z,...)
)+
for an extra termdolphins_depth_xy <- gam(count ~ s(depth) + s(x, y) + offset(off.set),
data = mexdolphins,
family=quasipoisson(), method="REML")
summary(dolphins_depth_xy)
Family: quasipoisson
Link function: log
Formula:
count ~ s(depth) + s(x, y) + offset(off.set)
Parametric coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -19.1933 0.9468 -20.27 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Approximate significance of smooth terms:
edf Ref.df F p-value
s(depth) 6.804 7.669 1.461 0.191
s(x,y) 23.639 26.544 1.358 0.114
R-sq.(adj) = 0.22 Deviance explained = 49.9%
-REML = 923.9 Scale est. = 79.474 n = 387
plot(dolphins_depth_xy, scale=0, pages=1)
scale=0
: each plot on different scalepages=1
: plot togetherplot(dolphins_depth_xy, select=2, cex=2, asp=1, lwd=2)
select=
picks which smooth to plotplot(dolphins_depth_xy, select=2, cex=2, asp=1, lwd=2, scheme=2)
scheme=2
much better for bivariate termsvis.gam()
is much more generalpar(mfrow=c(1,2))
vis.gam(dolphins_depth_xy, view=c("depth","x"), too.far=0.1, phi=30, theta=45)
vis.gam(dolphins_depth_xy, view=c("depth","x"), plot.type="contour", too.far=0.1,asp=1/1000)
gam
does all the workglm
s
indicates a smooth termplot
can give simple plotsvis.gam
for more advanced stuffdata.frame
with \( x, y, \text{Depth}, A \)predict()
preds <- predict(my_model, newdat=my_data, type="response")
(se.fit=TRUE
gives a standard error for each prediction)
dolphin_preds <- predict(dolphins_depth_xy, newdata=preddata,
type="response")
(ggplot2
code included in the slide source)
data.frame
)type=...
argument!se.fit
\( \boldsymbol{\lambda} \): uncertainty in the smoothing parameter
(Traditionally we've only addressed the former)
(New tools let us address the latter…)
From theory:
\[ \boldsymbol{\beta} \sim N(\hat{\boldsymbol{\beta}}, \mathbf{V}_\boldsymbol{\beta}) \]
(caveat: the normality is only approximate for non-normal response)
What does this mean? Variance for each parameter.
In mgcv
: vcov(model)
returns \( \mathbf{V}_\boldsymbol{\beta} \).
plot
se.fit
For regular predictions:
\[ \hat{\boldsymbol{\eta}}_p = L_p \hat{\boldsymbol{\beta}} \]
form \( L_p \) using the prediction data, evaluating basis functions as we go.
(Need to apply the link function to \( \hat{\boldsymbol{\eta}}_p \))
But the \( L_p \) fun doesn't stop there…
To get variance on the scale of the linear predictor:
\[ V_{\hat{\boldsymbol{\eta}}} = L_p^\text{T} V_\hat{\boldsymbol{\beta}} L_p \]
pre-/post-multiplication shifts the variance matrix from parameter space to linear predictor-space.
(Can then pre-/post-multiply by derivatives of the link to put variance on response scale)
$Vp
what we got with vcov
$Vc
the corrected versionmgcv
does most of the hard work for usglm
with extra s()
termsmgcv
sheilds you from