Comparing more than two groups: one-way ANOVA

First, enter the data. We could enter our own but for this example we’ll use one of the datasets that come pre-loaded in R:

# see how the data look like

We’ll be focusing on the differences in petal length among Iris species:

plot(Petal.Length ~ Species, data=iris)

A little secret: here R is looking at the data and seeing that we want to plot numerical values against a categorical variable (groups). Normally R will act dumb and wait for us to be super-specific, but in this case it’s making a little decision for us and it’s plotting the data as a box-and-whisker plot by default.
In other words, R is calling another function called boxplot() under the hood. for details, type ?boxplot in the terminal. So that you know:

  • The thick line is not the mean but the median
  • the upper and lower base of the box (called “hinges” in the help file) represent the first and third quartile, so the box contains ~50% of the values
  • the ends of the “whiskers” delimit the 95% confidence interval.

IMPORTANT NOTE: whether it’s a t-test, ANOVA, ANCOVA, linear regression, multiple linear regression… Statistically speaking we are always testing what’s called a “linear model” – very roughly, a model that measures linear changes in the values of the response variable in relationship to linear changes in the explanatory variables.
For all these analyses we can just specify the model using function lm(), and then test/examine it using functions anova() and summary().

test <- lm(Petal.Length ~ Species, data=iris)
# Coefficients:
                  # Estimate Std. Error t value Pr(>|t|)    
# (Intercept)        1.46200    0.06086   24.02   <2e-16 ***
# Speciesversicolor  2.79800    0.08607   32.51   <2e-16 ***
# Speciesvirginica   4.09000    0.08607   47.52   <2e-16 ***
# ---
# Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

The coefficient estimates are the mean Petal.Length per species. R display them by contrast with a reference level, chosen by R as the first level in alpha-numerical order. In this case, Intercept refers to Iris setosa. The second estimate, Speciesversicolor, is the mean difference in Petal.Length between species I. versicolor and the reference species. The third estimate refers to I. virginica. So the actual mean Petal.Length for I. versicolor is 1.46200+2.79800 and mean Petal.Length for I. virginica is 1.46200+4.09000. The p-values indicate that all these contrasts are statistically significant.

We can use anova() to compare the model of interest with the null model:

test0 <- lm(Petal.Length ~ 1, data=iris)

The writing Petal.Length ~ 1 signifies that we are trying to predict Petal.Length of all species simply based on its grand mean, not accounting for differences among species.

anova(test, test0)
# Analysis of Variance Table

# Model 1: Petal.Length ~ Species
# Model 2: Petal.Length ~ 1
  # Res.Df    RSS Df Sum of Sq      F    Pr(>F)    
# 1    147  27.22                                  
# 2    149 464.33 -2    -437.1 1180.2 < 2.2e-16 ***
# ---
# Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Function anova() performs an F-test on the residual variability of the compared models. The two models differ in a statistically significant way. The Residual Sum of Squares (RSS) of Model 1 (Petal.Length ~ Species), a proxy for the amount of variability left unaccounted for by the model, is smaller than that of Model 2 (Petal.Length ~ 1), so we retain Model 1 as the best of the two.

An alternative tool for model selection is the Akaike Information Criterion (AIC). Here are some considerations on AIC by Brian McGill. The AIC estimates the goodness of a model by measuring how well the model fit the data, while penalising models according to their number of parameters. In other words, it favours the simplest model that describes the data best. The lower the AIC, the better the model. For small sample sizes the corrected AIC (AICc) should be used. See: AICc() in package MuMIn. For large sample sizes, AICc ~ AIC. The minimum AIC difference for a model to be considered significantly better at describing the data is debated; Harrison et al. (2018) suggest ∆AIC=6. For competing models with ∆AIC<6, I consider the simplest to be the best.