Two-way ANOVA

Often we are dealing with experiments or studies with more than one explanatory variable. In such situation, more than one treatment is applied simultaneously and we want to know what’s their relative effect on the response variable. If both explanatory variables are categorical, the analysis traditionally used is a two-way ANOVA.

Example: Carbon Dioxide Uptake in Grass Plants


The CO2 data frame has 84 rows and 5 columns of data from an experiment on the cold tolerance of the grass species Echinochloa crus-galli.
The CO2 uptake of six plants from Quebec and six plants from Mississippi was measured at several levels of ambient CO2 concentration. Half the plants of each type were chilled overnight before the experiment was conducted.

library(lattice) # quick and dirty way of displaying "complicated" data.
xyplot(uptake ~ Treatment, group=Type,
type = c("p","r"),
auto.key=T, # adds legend
bwplot(uptake ~ Treatment | Type, data=CO2)

I use lattice for quick-and-dirty displays and plot() for publications. Another popular graphics package is ggplot2, which has plenty of extensions for unusual graphics (e.g. see here and here).

# Define the models to compare:
CO2.lm.full <- lm(uptake ~ Treatment * Type, data=CO2)
CO2.lm.additive <- lm(uptake ~ Treatment + Type, data=CO2)
CO2.lm.null <- lm(uptake ~ 1, data=CO2)
anova(CO2.lm.additive, CO2.lm.null)
anova(CO2.lm.full, CO2.lm.additive)
anova(CO2.lm.additive, CO2.lm.full)
# models can be also compared via AIC:
AIC(CO2.lm.full, CO2.lm.additive, CO2.lm.null)

(Here are some considerations on AIC by Brian McGill.)

Function summary() provides mean differences between treatments as well as p-values from pairwise t-tests, but it does so only in comparison to a reference level (the one that comes first alpha-numerically). It is possible to produce all possible contrasts by changing the reference level by hand using function relevel(), but I find the following function way more practical:

pairwise.t.test(x = CO2$uptake, g = CO2$Treatment, p.adj="none")
# Oooops, it only handles one response variable at a time!
# easy fix:
CO2$treatment_combined <- paste(CO2$Treatment, CO2$Type, sep="_")
# check:
pairwise.t.test(x = CO2$uptake, g = CO2$treatment_combined, p.adj="none")

Similarly, it’s possible to obtain mean and SD for each treatment combination from summary(), but the method shown below is much faster:

msd <- summaryBy(uptake ~ Treatment * Type,
FUN=c(mean, sd, length)

Now we can plot the data as a barplot:

mids <- barplot(msd$uptake.mean,
 ylab="CO2 uptake",
 names.arg = msd$Treatment,
 col=c(rep(c(grey(0.8), grey(0.3)),2))

# Add error bars using arrows():
# ?arrows
 with(msd, arrows(mids,, mids, uptake.mean +,
 code=3, angle=90, length=0.1)) # length=0.1mm

legend("topright", title="Origin",
 fill=c(grey(0.8), grey(0.3))


Useful links: