Skip to content

The advantage of a paired t-test

    The following is a graphical explanation of the advantage of accounting for non-independence of observations when testing for differences between groups.

    rm(list=ls())
    set.seed(66)
    # simulate data
    x1 <- rnorm(n=100, mean=30, sd= 10)
    x2 <- rnorm(n=100, mean=60, sd= 10)
    
    # arrange the data in a dataset
    dd <- data.frame(ID=rep(paste("ID", seq(1,100, by=1), sep="_"),2),
    			response=c(x1,x2),
    			diet=c(rep("A", 100), rep("B", 100))
    			)
    dd$response2 <- c(sort(x1, decreasing = FALSE), sort(x2, decreasing = TRUE))

    Diets A and B refer to the same 100 subjects. For example, they may represent the body fat percentage of 100 pigs when being fed with diet A and diet B. The body fat distribution for Diet A is the same in response and response2; so is the body fat distribution for Diet B.

    In response, data points from groups A and B are not associated according to any pattern (Fig.1: a;c): body fat percentage is overall higher for Diet B, but each subject responds differently to the change in diet. In response2, body fat percentage is higher for Diet B than in is for Diet A for all subjects (Fig.1: b;d). The scenario described by response2 represents a more uniform response to the change in diet: in other words, the response to the change in diet has less intra-individual variability.

    If we test for differences between response and response2 in the two groups without accounting for subject identity, the t-tests give exactly the same results:

    t.test(dd$response[which(dd$group=="A")],
    		dd$response[which(dd$group=="B")],
    		paired=F, var.equal=T
    		)
    
    	# Two Sample t-test
    
    # data:  dd$response[which(dd$group == "A")] and dd$response[which(dd$group == "B")]
    # t = -22.9, df = 198, p-value <2e-16
    # alternative hypothesis: true difference in means is not equal to 0
    # 95 percent confidence interval:
     # -32.549 -27.388
    # sample estimates:
    # mean of x mean of y 
    
    t.test(dd$response2[which(dd$group=="A")],
    		dd$response2[which(dd$group=="B")],
    		paired=F, var.equal=T
    		)
    
    	# Two Sample t-test
    
    # data:  dd$response2[which(dd$group == "A")] and dd$response2[which(dd$group == "B")]
    # t = -22.9, df = 198, p-value <2e-16
    # alternative hypothesis: true difference in means is not equal to 0
    # 95 percent confidence interval:
     # -32.549 -27.388
    # sample estimates:
    # mean of x mean of y 
       # 30.403    60.371 

    By running paired t-tests we account for the non-independence of the observations under Diet A and Diet B. This allows to reduce the amount of unexplained variability, thus increasing the signal-to-noise ratio:

    t.test(dd$response[which(dd$group=="A")],
    		dd$response[which(dd$group=="B")],
    		paired=T, var.equal=T
    		)
    	# Paired t-test
    
    # data:  dd$response[which(dd$group == "A")] and dd$response[which(dd$group == "B")]
    # t = -22.5, df = 99, p-value <2e-16
    # alternative hypothesis: true difference in means is not equal to 0
    # 95 percent confidence interval:
     # -32.607 -27.331
    # sample estimates:
    # mean of the differences 
                    # -29.969 
    
    t.test(dd$response2[which(dd$group=="A")],
    		dd$response2[which(dd$group=="B")],
    		paired=T, var.equal=T
    		)
    	# Paired t-test
    
    # data:  dd$response2[which(dd$group == "A")] and dd$response2[which(dd$group == "B")]
    # t = -177, df = 99, p-value <2e-16
    # alternative hypothesis: true difference in means is not equal to 0
    # 95 percent confidence interval:
     # -30.305 -29.632
    # sample estimates:
    # mean of the differences 
                    # -29.969 

    The t-value measures the size of the difference between two groups relative to the variability in the data. The closer the value of t is to zero, the more likely it is that the two groups are not significantly different. Note how the module of the t-value is higher in the latter case.

    Did you enjoy this? Consider joining my on-line course “First steps in data analysis with R” and learn data analysis from zero to hero!