How to analyse ratios? For example, how to analyse changes in population density according to a certain predictor? Population density equals number of individuals over a certain area, so it is tempting to treat densities as count data, therefore analysing them as:
glm(density ~ predictor, family="poisson")
density = abundance/land_area
This would be incorrect for at least two reasons. Firstly, a Poisson distribution can only deal with integers. This can be addressed by using
family="quasipoisson". A quasi-Poisson distribution can deal with non-integers, but it still doesn’t address another important aspect: a population density of 0.5 can be the result of one individual per 2 km2, or 100 individuals per 200 km2, etc. The ratio is the same, but the latter case should have more weight on the outcome of the analysis, because what we observe over 200 km2 is more representative of what we observe over 2 km2. In other words: if we observe something over a large area, that’s more likely to represent what happens across the entire landscape than if we observed it over a small area. Using an offset addresses both issues:
glm(abundance ~ predictor, family="poisson", offset = log(land_area))
The offset is basically the denominator by which we are dividing abundance, but it’s also the weight by which every observation is weighted. This way, R not only gets the densities right, but it gives more importance to densities measured over larger areas, thus increasing the power of the analysis.
Note that the offsets must be provided on the scale of the link function, which is why I used
log(land_area) rather than
land_area in the code above.
A similar argument as the one I have just described applies to observations belonging to other distributions, such as the binomial.
Did you enjoy this? Consider joining my on-line course “First steps in data analysis with R” and learn data analysis from zero to hero!