# A regression-like situation with count data as response variable

For this example I will use a dataset from the excellent Getting Started with R: An Introduction for Biologists (1st Edition) by Beckerman and Petchey.

``````rm(list=ls())
rm(list=ls())
# install.packages("RCurl")
library(RCurl) # allows accessing data from URL
``````head(dd)
plot(fitness ~ body.size, data=dd,
main="Weigth at birth and fitness in sheep",
xlab="Weight of the mother at birth (pounds)",
lm1 <- lm(fitness ~ body.size, data=dd)
summary(lm1)
# check the model assumptions (graphically):
par(mfrow=c(3,2))
plot(lm1)
hist(lm1\$resid)
# data are non-normal and non-linear.``````

By looking at the data and at the diagnostics plots we can tell that they are not normally distributed. The response variable is represented by count data, namely non-negative integers. To normalise these data we can use two approaches:

• We can go old-school and log-transform the response variable, test its linear correlation with body.size using lm(), then exponentiate the “slope” to back-transform it;
• We can use a generalised linear model, which allows to specify non-normal distributions of residuals:
``````glm1 <- glm(fitness ~ body.size, data=dd, family="poisson"(link = "log"))
summary(glm1)``````

The function glm() estimates both the model parameters and their Std. Errors on the link (log) scale. The statistical tests are run assuming that thew residuals follow not a Normal, but a Poisson distribution.

``````# compute SE using predict()
preddata <- data.frame(body.size = seq(from=min(dd\$body.size),
to=max(dd\$body.size),
by=0.01
))
preds <- predict(glm1, newdata=preddata, type="response", se.fit=T)
fit <- preds\$fit
# limits of 95% CI:
lwr <- fit - 1.96*preds\$se.fit
upr <- fit + 1.96*preds\$se.fit``````

To plot mean values with CI:

``````plot(fitness ~ body.size, data=dd,
main="Weigth at birth and fitness in sheep",
xlab="Weight of the mother at birth (pounds)",