Generalised Additive Models (GAM): notes

I wrote the following notes as a memorandum for myself, but I am sharing them in the hope that they may be useful for others too. These notes are in no way meant to replace the study of specialised literature, a selection of which is listed at the end of this article.

Response variables can show a non-linear correlation with one or more continuous explanatory variables. Generalised additive models (hereafter: GAM) provide a tool for modelling such correlations. Generalised Additive Models are “generalized linear models with a linear predictor involving a sum of smooth functions of covariates” (S.N. Wood, 2017). In other words, they estimate functional relationships between predictors and the response variable by approximating continuous predictors using smooth curves, or “smoothers”. Smoothers are functions which are, themselves, sums of functions. The functions that, summed together, produce a smoother are called its basis functions.

How can it be that a linear model can describe non-linear correlations? This calls for a refresher of what a linear model is. Linear models are models that describe a continuous response variable as a function of a linear combination of parameters and explanatory variables. A linear combination is “an expression constructed from a set of terms by multiplying each term by a constant and adding the results” (source). For example, y = a + b*x is a linear model. On the contrary, y = a * exp(b*x) is not, because y is equalled to an expression that does not fit the definition of a linear combination. The polynomial model y = a + b*x + c*x2 describes a U-shaped correlation between y and x, but it is a linear model because the right side of the model is a linear combination of parameters and predictors. This shows that, given a response variable y and a predictor x, the correlation between y and x can be non-linear in shape and still be described using a linear model. The term “linear” referred to a model refers to how the model is “assembled”, not to the shape of the relationship between its variables.

The polynomial model discussed above is an Additive Model itself. Polynomials follow the general expression of an additive model: y = f1(x) + f2(x) + … fn(x). In the case of a second-degree polynomial:

y = a + b*x + c*x2

= f1(x) + f2(x)

Where f1(x) = a + bx and f2(x) = c*x2

The problem with polynomial models is that they get more and more “wiggly” as their degree increases, and this out-of-control “wiggliness” often affects their ability to approximate data well. To overcome this issue, GAM are fitted using splines, namely piecewise polynomial models. To avoid overfitting, models with a large basis size (i.e. many basis functions) are assigned a smoothing penalty which prevents eccessive “wiggliness”.

The penalisation operates by calculating a penalty matrix based on the integral of the squared derivative of each basis function.

Effective Degrees of Freedom (EDF), estimated as the maximum number of coefficients in the model minus the model’s constraints, are a proxy for the degree of non-linearity in the  in stressor-response relationships: the larger the EDF, the wigglier the model is.

Package mgcv is a powerful tool to fit GAMs in R. It can:

  • handle both normally distributed and non-normally distributed data;
  • fit complex models including both continuous and categorical data, interactive effects, and more;
  • fit mixed-effect GAMs.

Interaction between smooth terms can be modelled in mgcv in two main ways:

  • using two-dimensional thin plate regression smoothers, with the syntax s(x1,x2). This can work if the predictors have the same rate of change (e.g. some geographic coordinate systems).
  • using tensor products, which create a new set of basis functions that allows for each smoother to have their own penalty matrix. Tensor products can be implemented with te(), ti() (for when main effects are also present), t2(). See: Pedersen et al. (2019), p. 7.

Hierarchical Generalised Additive Models (HGAM)

Hierarchical Generalised Additive Models (HGAM) allow to account for structure in the data and can follow one of the following five structures:

1) Models with a single common smoother for all observations (G)

gam(y ~ s(x) 
      + s(fac, bs=“re”)) # “re” is short for “random effect”

2) Models with a global smoother for all observations, plus group-level smoothers assumed to have all the same wiggliness (GS)

gam(y ~ s(x, m=2) 
      + s(x, fac, bs=“fs”, m=2)) 
      # “fs” is short for “factor-smoother [interaction]”

gam(y ~ te(x1, x2, bs=“tp”, m=2) 
      + t2(x1, x2, fac, bs=c(“tp”, “tp”, “re”), 
      m=2, full=T)
      )

The argument m specifies the order of the derivatives on which the penalty matrix is calculated. I take Pedersen et al.’s word for the choice of the values of m.

3) GI: similar to GS but allowing group-level smoothers to differ in wiggliness.

gam(y ~ s(x, m=2)
      + s(x, by= fac, bs=“tp”, m=1)
      # “tp” is short for “thin plate [regression spline]"
      + s(fac, bs=“re”))

4) S: similar to GS but without a global smoother term.

gam(y ~ s(x, fac, bs=“fs”))

gam(y ~ t2(x1, x2, fac, bs=c(“tp”, “tp”, “re”)))

5) I: similar to GI but without a global smoother term.

gam(y ~ fac 
      + s(x, by=fac))

gam(y ~ fac 
      + te(x1, x2, by=fac))

Model comparison

These models can be compared by their AIC (Pedersen et al. (2019), p. 20), but model selection should also keep the goals of the study into account. For example, models S and I cannot be used to predict the predictor-response relationship for groups that are not among those used for estimating the model parameters. Similarly, GS models can be used to simulate functional variation for unobserved group levels, while GI models cannot.
Cross-validation is recommended for assessing the predictive power of a model.

Function gam.check() and k.check() allow to check if the family distribution chosen for the residuals is appropriate, and whether the model is overfitting the data.

Other aspects to keep in mind during model selection are bias-variance trade-offs and the trade-off between complexity and computational costs [Pedersen et al. (2019): p. 26 ss., p. 31 ss.].

Alternative formulations

  • bam(), for large datasets and large number of groups;
  • gamm(), which uses nlme();
  • gamm4(), which uses package lme4.

Useful literature