# Comparing two groups: t-test

Let’s compare the petal length of two Iris species, I. setosa and I. versicolor.

``````data(iris)
dd <- subset(iris, select=c(Species, Petal.Length), Species!="virginica")
dd\$Species <- as.character(dd\$Species)``````

Always start from plotting your data! Get a visual understanding of the data and patterns (or lack thereof) before running analyses.

`````` boxplot(Petal.Length ~ Species, data=dd)
# boxplot() creates a box-and-whiskers plot.``````

I could have just used function plot(). When plotting data versus a categorical variable, R uses a box-and-whiskers plot by default.

• The thick line is not the mean but the median
• the two bases of the box (calles “hinges” in the help file) represent the first and third quartile, so the box contains ~50% of the values
• the ends of the “whiskers” delimit the 95% confidence interval.
``` # there's a function for t tests: t.test()
t.test(Petal.Length ~ Species, data=dd) # Welch's t-test
t.test(Petal.Length ~ Species, data=dd, var.equal=TRUE) # Student's t-test
# spot the differences! ```

IMPORTANT! Whether it’s a t-test, ANOVA, ANCOVA, linear regression, multiple linear regression… Statistically speaking we are always testing what’s called a “linear model” – very roughly said, it is a model that estimates linear changes in the response variable in relationship with changes in the explanatory variables.

All general linear models can be performed in R using function lm(), and then tested/examined using functions anova() and summary().

`````` test1 <- lm(Petal.Length ~ Species, data=dd)
test0 <- lm(Petal.Length ~ 1, data=dd) # null model
anova(test1, test0) # anova() compares models testing whether their residual amounts of variability differ significantly
summary(test1) ``````
``````# Code for the figure above:
par(mfrow=c(2,1))

hist(subset(dd, Species!="versicolor")\$Petal.Length, freq=T, xlim=c(0,6), col=grey(0.3), breaks=c((1:20)/2), ylim=c(0, 40), main="", xlab="Petal length (cm)")
hist(subset(dd, Species=="versicolor")\$Petal.Length, freq=T, xlim=c(0,6), col=grey(0.75), breaks=c((1:20)/2), add=T)

legend("topright", bty="n",
legend=c("Iris setosa", "I. virginica"),
pch=22, pt.cex=3,
pt.bg=c(grey(0.3), grey(0.75))
)

plot(density(subset(dd, Species!="versicolor")\$Petal.Length, bw = "sj", adjust=2), xlim=c(0,6), main="Density distributions", xlab="Petal length (cm)")
lines(density(subset(dd, Species=="versicolor")\$Petal.Length, bw = "sj", adjust=2), col="red")
text(x=c(2.2, 4.5), y=c(1.5, 0.7),
labels=c("Iris setosa", "I. virginica"),
col=c("black", "red")
)

# adjust=... is an extra argument for density() lty="dashed" )
# note how I used the ash symbol to make a piece of code "silent".
# I find this useful when I am fiddling with code, to silence sections of it without deleting them.``````