# Zero-inflated models

Count data are generally modelled by accounting for their residuals to follow a Poisson distribution. Data expressed as ratios are generally modelled by accounting for their residuals to follow a binomial distribution. In both cases, too many zeroes in the response variable can become a problem: the overabundance of zeroes, known as zero-inflation, can translate in overdispersion, therefore making the distribution of choice (Poisson or binomial) a poor fit to the residuals.

One way to address zero-inflation is to use the zero-inflated versions of Poisson and binomial distributions. These zero-inflated distributions treat the data as a mixture of data generated by a Poisson or a binomial process, and data that belong to a zero-only distribution (in other words, a distribution with mean of 0 and variance of 0).

According to Paul Allison, in the case of zero-inflated sets of ratios, modelling the data according to a negative binomial distribution can be an alternative to zero-inflated binomial distributions, providing a better fit and requiring less computational power. He concludes: “having a lot of zeros doesn’t necessarily mean that you need a zero-inflated model.” His considerations and his exchanges with William Greene and Dominique Lord (see this article and its comment section) are a worthy read.

As a reminder, here are some definitions:

• the Poisson distribution is “a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time or space if these events occur with a known constant mean rate and independently of the time since the last event” (reference);
• the Binomial distribution estimates the probability of a number of independent events equal to X to be observed in n trials (reference);
• the Negative Binomial distribution estimates the probability that a number of trials equal to Y is required until an event is observed for the rth time (reference).

Modelling zero-inflated datasets with glmmTMB

Package glmmTMB allows to:

• “specify a zero-inflation model (via the ziformula argument) with fixed and/or random effects” (Ben Bolker)
• specify a negative binomial model using `family=nbinom1`, which assumes Var∝𝜇, or `family=nbinom2`, which assumes Var=𝜇*(1 + 𝜇/𝑘). For both nbinom2 and nbinom2 it is possible to specify if the dispersion parameter is affected by any of the explanatory variables using the argument `disp` (see: `dispformula` under `?glmmTMB`; `?sigma.glmmTMB`).

A negative binomial distribution and the zi argument can be used together: “`ziformula` specifies zero-inflation [while] `family=nbinom2` [or `family=nbinom1`] take care of other sources of overdispersion” (Ben Bolker).

`ziformula` is “a one-sided (i.e., no response variable) formula for zero-inflation combining fixed and random effects: the default `~0` specifies no zero-inflation. Specifying `~.` sets the zero-inflation formula identical to the right-hand side of formula (i.e., the conditional effects formula); terms can also be added or subtracted. When using `~.` as the zero-inflation formula in models where the conditional effects formula contains an offset term, the offset term will automatically be dropped. The zero-inflation model uses a logit link” (from `?glmmTMB`). It “describes how the probability of an extra zero (i.e. structural zero) will vary with predictors” (see this excellent vignette by Mollie Brooks). By default, glmmTMB excludes zero-inflation; specifying `zi=~0 `does so explicitly. Specifying `zi=~1` applies a single zero-inflation parameter to all observations.