A graphic interpretation of ANCOVA

Plotting the data should always be the first step of any analysis: a visual representation of the data allows to visualise patterns, and facilitates the interpretation of quantitative analyses. Actually, a visual representation of the possible outcomes of a study should be done before the first data point has even been collected: it helps draw predictions based on the hypothesis we are testing, and it facilitates the study design.

Here I provide a graphic representation of six possible outcomes of an ANCOVA design, namely a study in which we are testing the effect of a continuous predictor, x, and a factorial predictor, “group”, on the response variable y. The lines represent the predicted mean value of y for each value of x under different scenarios. Lines of different colors refer to different levels of “group” (in this example, “group” has only two levels).

  • (a): neither predictor has an effect on y. The mean value of y does not change regardless of the value assumed by x, and the mean value of y does not differ in the two groups.
  • (b): y is not affected by x, but the mean value of y differs in the two groups (the mean value of y is higher in the red group). Formal model selection would confirm that x has no predictive value over y in the observed range of x values, and can be dropped from the model.
  • (c): y increases as x increases, but the lines for the two groups are indistinguishable. Formal model selection would confirm that “group” has no predictive value over y and can be dropped from the model, while x should be retained.
  • (d): y increases as x increases in both groups. The values of y in the red group are higher, on average, than those in the black group, but the rate of change of y for unitary changes in x is the same in both groups. This is best described by an “additive” model, in which each value of y is the result of the sum of the effect of x on y and of the effect of group identity on y.
  • (e): y is significantly correlated to x in both groups, but the correlation is positive in the black group and negative in the red group. This is best described by a model that includes an interactive effect of x and group identity. The corresponding linear model is something along the lines of: y = a + b*x + c*group + d*(x*group) + error
  • (f): this scenario it also the result of an interactive effect between the predictors. In this case, y values in the black group are unaffected by changes in x, and only the y values in the red group are significantly correlated to x.