The following example is an imaginary scenario based on real-world data. We are looking at how resource availability affects unicorns’ social patterns: specifically, we are looking at how unicorn herd size changes with food quantity. We surveyed unicorn herd sizes at seven localities over the course of twelve months. Food quantity was assessed monthly, so for each locality at each month we have one value of food availability (continuous predictor, let’s say grass cover). For each locality on each month, unicorn herds were re-surveyed over four consecutive days to account for both unicorn movement and possible sampling error. For each sampling day there are as many data points as there were unicorn herds.
Data available here:
unicorns <- read.csv(text= RCurl::getURL("https://raw.githubusercontent.com/marcoplebani85/datasets/master/unicorns.csv"))
It’s good practice to start with a visual exploration of the data, which suggests that herd size increases with food availability (Fig.1):
Looking at the data separately by locality it appears that the relationship between food availability and herd size differ among localities (Fig. 2):
The relationship between food availability and herd size differs also among survey months, but by eyeballing the graphs it looks like survey month is a smaller source of variability compared to locality (Fig. 3):
The aim is to fit a model that:
- captures the possible relationship between herd size and food availability;
- accounts for the data being non-normally distributed;
- accounts for the possible sources of background variability, namely: inter-locality variability and inter-month variability;
- accounts for the non-independence of measurements performed over consecutive days in the same month.
The powerful function
glmmTMB, from the homonymous package, allows to do all of the above.
# factorize survey_day so that glmmTMB can use it to estimate temporal autocorrelation: unicorns$survey_day_f <- glmmTMB::numFactor(unicorns$survey_day_n)
unicorns_glmmTMB0 <- glmmTMB::glmmTMB(Herd_size_n ~ food.quantity + (1 + food.quantity | Locality) + (1 + food.quantity | Year_Month) + ou(survey_day_f + 0 | Locality), # autocorrelation term family="poisson", # adequate for count data data=unicorns) # Error in eigen(h) : infinite or missing values in 'x'
The model above is the best we could fit, given the information available: it allows for
Herd_size_n ~ food.quantity to differ in intercept and slope at the level of
Year_Month, and it accounts for temporal autocorrelation. Unfortunately, this model fails to converge, meaning that it’s probably too complicated for all its parameters to be estimated reliably given the amount of data available.
Dealing with convergence issues
- rescale the predictor(s);
- change optimiser;
- simplify the model.
Refitting the model after scaling
food.quantity works but doesn’t converge:
dd.CS <- transform(unicorns, food.quantity =scale(food.quantity)) unicorns_glmmTMB.cs <- glmmTMB::glmmTMB(Herd_size_n ~ food.quantity + (1 + food.quantity | Locality) + (1 + food.quantity | Year_Month) + ou(survey_day_f + 0 | Locality), family="poisson", data= dd.CS) # Warning messages: # 1: In fitTMB(TMBStruc) : # Model convergence problem; non-positive-definite Hessian matrix. See vignette('troubleshooting') # 2: In fitTMB(TMBStruc) : # Model convergence problem; singular convergence (7). See vignette('troubleshooting')
Changing optimiser does not help either:
unicorns_glmmTMB.csb <- update(unicorns_glmmTMB.cs, control=glmmTMBControl(optimizer=optim, optArgs=list(method="BFGS"))) # Warning message: # In fitTMB(TMBStruc) : # Model convergence problem; non-positive-definite Hessian matrix. See vignette('troubleshooting')
A simpler model, which allows for random slopes for localities but not for survey months, still does not converge:
unicorns_glmmTMB <- glmmTMB(Herd_size_n ~ food.quantity + (1 + food.quantity | Locality) + (1 | Year_Month) + ou(survey_day_f + 0 | Locality), family="poisson", data= unicorns)
Rescaling, changing optimiser to
BFGS, or doing both did not help. The next step of model selection/simplification led me to a model allowing only for random intercept for both
Locality and survey month (
unicorns_glmmTMB_noslope <- glmmTMB::glmmTMB(Herd_size_n ~ food.quantity + (1 | Locality) + (1 | Year_Month) + ou(survey_day_f + 0 | Locality), family="poisson", data= unicorns) unicorns_glmmTMB_noslope$sdr$pdHess ## a test for convergence. There is convergence
Although Fig. 2 suggests that including
(1 + food.quantity | Locality) would better reflect the nature of my data compared to a model including only
(1 | Locality), model
unicorns_glmmTMB_noslope is the best that can be fitted to the data.
summary(unicorns_glmmTMB_noslope) # Conditional model: # Estimate Std. Error z value Pr(>|z|) # (Intercept) -0.125 0.787 -0.16 0.8735 # food.quantity 0.715 0.273 2.62 0.0089 **
Model selection between this model and simpler ones (e.g. not including
Year_Month, or either) can be done by comparing their corrected AIC (hereafter: AICc). The correction in AICc is an adjustment for small sample sizes, which is not really necessary here but does not hurt (for large sample sizes, AICc ~ AIC). See:
MuMIn::AICc(). The best model is the simplest one with the lowest AICc. The minimum AIC difference for a model to be considered significantly better at describing the data is debated; Harrison et al. (2018) suggest ∆AIC=6. For competing models with ∆AIC below the threshold, I consider the simplest to be the best.
Header image: Miniature of a unicorn, in Philes, De natura animalium (France, 16th century). London, British Library. Public domain. Source