The advantage of a paired t-test

The following is a graphical explanation of the advantage of accounting for non-independence of observations when testing for differences between groups.

rm(list=ls())
set.seed(66)
# simulate data
x1 <- rnorm(n=100, mean=30, sd= 10)
x2 <- rnorm(n=100, mean=60, sd= 10)

# arrange the data in a dataset
dd <- data.frame(ID=rep(paste("ID", seq(1,100, by=1), sep="_"),2),
			response=c(x1,x2),
			diet=c(rep("A", 100), rep("B", 100))
			)
dd$response2 <- c(sort(x1, decreasing = FALSE), sort(x2, decreasing = TRUE))

Diets A and B refer to the same 100 subjects. For example, they may represent the body fat percentage of 100 pigs when being fed with diet A and diet B. The body fat distribution for Diet A is the same in response and response2; so is the body fat distribution for Diet B.

In response, data points from groups A and B are not associated according to any pattern (Fig.1: a;c): body fat percentage is overall higher for Diet B, but each subject responds differently to the change in diet. In response2, body fat percentage is higher for Diet B than in is for Diet A for all subjects (Fig.1: b;d). The scenario described by response2 represents a more uniform response to the change in diet: in other words, the response to the change in diet has less intra-individual variability.

If we test for differences between response and response2 in the two groups without accounting for subject identity, the t-tests give exactly the same results:

t.test(dd$response[which(dd$group=="A")],
		dd$response[which(dd$group=="B")],
		paired=F, var.equal=T
		)

	# Two Sample t-test

# data:  dd$response[which(dd$group == "A")] and dd$response[which(dd$group == "B")]
# t = -22.9, df = 198, p-value <2e-16
# alternative hypothesis: true difference in means is not equal to 0
# 95 percent confidence interval:
 # -32.549 -27.388
# sample estimates:
# mean of x mean of y 

t.test(dd$response2[which(dd$group=="A")],
		dd$response2[which(dd$group=="B")],
		paired=F, var.equal=T
		)

	# Two Sample t-test

# data:  dd$response2[which(dd$group == "A")] and dd$response2[which(dd$group == "B")]
# t = -22.9, df = 198, p-value <2e-16
# alternative hypothesis: true difference in means is not equal to 0
# 95 percent confidence interval:
 # -32.549 -27.388
# sample estimates:
# mean of x mean of y 
   # 30.403    60.371 

By running paired t-tests we account for the non-independence of the observations under Diet A and Diet B. This allows to reduce the amount of unexplained variability, thus increasing the signal-to-noise ratio:

t.test(dd$response[which(dd$group=="A")],
		dd$response[which(dd$group=="B")],
		paired=T, var.equal=T
		)
	# Paired t-test

# data:  dd$response[which(dd$group == "A")] and dd$response[which(dd$group == "B")]
# t = -22.5, df = 99, p-value <2e-16
# alternative hypothesis: true difference in means is not equal to 0
# 95 percent confidence interval:
 # -32.607 -27.331
# sample estimates:
# mean of the differences 
                # -29.969 

t.test(dd$response2[which(dd$group=="A")],
		dd$response2[which(dd$group=="B")],
		paired=T, var.equal=T
		)
	# Paired t-test

# data:  dd$response2[which(dd$group == "A")] and dd$response2[which(dd$group == "B")]
# t = -177, df = 99, p-value <2e-16
# alternative hypothesis: true difference in means is not equal to 0
# 95 percent confidence interval:
 # -30.305 -29.632
# sample estimates:
# mean of the differences 
                # -29.969 

The t-value measures the size of the difference between two groups relative to the variability in the data. The closer the value of t is to zero, the more likely it is that the two groups are not significantly different. Note how the module of the t-value is higher in the latter case.