# The advantage of a paired t-test

The following is a graphical explanation of the advantage of accounting for non-independence of observations when testing for differences between groups.

``````rm(list=ls())
set.seed(66)
# simulate data
x1 <- rnorm(n=100, mean=30, sd= 10)
x2 <- rnorm(n=100, mean=60, sd= 10)

# arrange the data in a dataset
dd <- data.frame(ID=rep(paste("ID", seq(1,100, by=1), sep="_"),2),
response=c(x1,x2),
diet=c(rep("A", 100), rep("B", 100))
)
dd\$response2 <- c(sort(x1, decreasing = FALSE), sort(x2, decreasing = TRUE))``````

Diets A and B refer to the same 100 subjects. For example, they may represent the body fat percentage of 100 pigs when being fed with diet A and diet B. The body fat distribution for Diet A is the same in `response` and `response2`; so is the body fat distribution for Diet B.

In `response`, data points from groups A and B are not associated according to any pattern (Fig.1: a;c): body fat percentage is overall higher for Diet B, but each subject responds differently to the change in diet. In `response2`, body fat percentage is higher for Diet B than in is for Diet A for all subjects (Fig.1: b;d). The scenario described by response2 represents a more uniform response to the change in diet: in other words, the response to the change in diet has less intra-individual variability.

If we test for differences between `response` and `response2` in the two groups without accounting for subject identity, the t-tests give exactly the same results:

``````t.test(dd\$response[which(dd\$group=="A")],
dd\$response[which(dd\$group=="B")],
paired=F, var.equal=T
)

# Two Sample t-test

# data:  dd\$response[which(dd\$group == "A")] and dd\$response[which(dd\$group == "B")]
# t = -22.9, df = 198, p-value <2e-16
# alternative hypothesis: true difference in means is not equal to 0
# 95 percent confidence interval:
# -32.549 -27.388
# sample estimates:
# mean of x mean of y

t.test(dd\$response2[which(dd\$group=="A")],
dd\$response2[which(dd\$group=="B")],
paired=F, var.equal=T
)

# Two Sample t-test

# data:  dd\$response2[which(dd\$group == "A")] and dd\$response2[which(dd\$group == "B")]
# t = -22.9, df = 198, p-value <2e-16
# alternative hypothesis: true difference in means is not equal to 0
# 95 percent confidence interval:
# -32.549 -27.388
# sample estimates:
# mean of x mean of y
# 30.403    60.371 ``````

By running paired t-tests we account for the non-independence of the observations under Diet A and Diet B. This allows to reduce the amount of unexplained variability, thus increasing the signal-to-noise ratio:

``````t.test(dd\$response[which(dd\$group=="A")],
dd\$response[which(dd\$group=="B")],
paired=T, var.equal=T
)
# Paired t-test

# data:  dd\$response[which(dd\$group == "A")] and dd\$response[which(dd\$group == "B")]
# t = -22.5, df = 99, p-value <2e-16
# alternative hypothesis: true difference in means is not equal to 0
# 95 percent confidence interval:
# -32.607 -27.331
# sample estimates:
# mean of the differences
# -29.969

t.test(dd\$response2[which(dd\$group=="A")],
dd\$response2[which(dd\$group=="B")],
paired=T, var.equal=T
)
# Paired t-test

# data:  dd\$response2[which(dd\$group == "A")] and dd\$response2[which(dd\$group == "B")]
# t = -177, df = 99, p-value <2e-16
# alternative hypothesis: true difference in means is not equal to 0
# 95 percent confidence interval:
# -30.305 -29.632
# sample estimates:
# mean of the differences
# -29.969 ``````

The t-value measures the size of the difference between two groups relative to the variability in the data. The closer the value of t is to zero, the more likely it is that the two groups are not significantly different. Note how the module of the t-value is higher in the latter case.

Did you enjoy this? Consider joining my on-line course “First steps in data analysis with R” and learn data analysis from zero to hero!