Zero-inflated models – Marco Plebani

Count data are generally modelled by accounting for their residuals to follow a Poisson distribution. Data expressed as ratios are generally modelled by accounting for their residuals to follow a binomial distribution. In both cases, too many zeroes in the response variable can become a problem: the overabundance of zeroes, known as zero-inflation, can translate in overdispersion, therefore making the distribution of choice (Poisson or binomial) a poor fit to the residuals.

One way to address zero-inflation is to use the zero-inflated versions of Poisson and binomial distributions. These zero-inflated distributions treat the data as a mixture of data generated by a Poisson or a binomial process, and data that belong to a zero-only distribution (in other words, a distribution with mean of 0 and variance of 0).

According to Paul Allison, in the case of zero-inflated sets of ratios, modelling the data according to a negative binomial distribution can be an alternative to zero-inflated binomial distributions, providing a better fit and requiring less computational power. He concludes: “having a lot of zeros doesn’t necessarily mean that you need a zero-inflated model.” His considerations and his exchanges with William Greene and Dominique Lord (see this article and its comment section) are a worthy read.

As a reminder, here are some definitions:

the Poisson distribution is “a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time or space if these events occur with a known constant mean rate and independently of the time since the last event” (reference);
the Binomial distribution estimates the probability of a number of independent events equal to X to be observed in n trials (reference);
the Negative Binomial distribution estimates the probability that a number of trials equal to Y is required until an event is observed for the r^th time (reference).

Modelling zero-inflated datasets with glmmTMB

Package glmmTMB allows to:

“specify a zero-inflation model (via the ziformula argument) with fixed and/or random effects” (Ben Bolker)
specify a negative binomial model using family=nbinom1, which assumes Var∝𝜇, or family=nbinom2, which assumes Var=𝜇*(1 + 𝜇/𝑘). For both nbinom2 and nbinom2 it is possible to specify if the dispersion parameter is affected by any of the explanatory variables using the argument disp (see: dispformula under ?glmmTMB; ?sigma.glmmTMB).

A negative binomial distribution and the zi argument can be used together: “ziformula specifies zero-inflation [while] family=nbinom2 [or family=nbinom1] take care of other sources of overdispersion” (Ben Bolker).

ziformula is “a one-sided (i.e., no response variable) formula for zero-inflation combining fixed and random effects: the default ~0 specifies no zero-inflation. Specifying ~. sets the zero-inflation formula identical to the right-hand side of formula (i.e., the conditional effects formula); terms can also be added or subtracted. When using ~. as the zero-inflation formula in models where the conditional effects formula contains an offset term, the offset term will automatically be dropped. The zero-inflation model uses a logit link” (from ?glmmTMB). It “describes how the probability of an extra zero (i.e. structural zero) will vary with predictors” (see this excellent vignette by Mollie Brooks). By default, glmmTMB excludes zero-inflation; specifying zi=~0 does so explicitly. Specifying zi=~1 applies a single zero-inflation parameter to all observations.

Other useful links

https://stats.idre.ucla.edu/r/dae/zip/
https://stats.stackexchange.com/questions/116007/when-to-use-zero-inflated-poisson-regression-and-negative-binomial-distribution

Did you enjoy this? Consider joining my on-line course “First steps in data analysis with R” and learn data analysis from zero to hero!

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.