Skip to content

Linear regression

    Linear regression is used to assess whether two continuous variables are linearly correlated.

    Example: is productivity correlated to employment rates? Let’s assess it using dataset langley, one of the datasets that come with R for teaching purposes.

    data(longley) # a dataset on economics and demographics
    plot(GNP ~ Employed, data = longley)
    # make it fancier:
     par(mar=c(5,5,5,3)) # adjusts the margins
     plot(GNP ~ Employed, data = longley,
     xlab="% employed adults (I guess)",
     ylab="Gross National Product (million $?)",
     main="Money money money!"
     )

    # it looks like there is a correlation…
    # …And it looks linear, so it should obey the model:
    # GNP = a + b * Employed
    # how to find intercept (a) and slope (b)?
    # let R do it:
     linmod1 <- lm(GNP ~ Employed, data = longley)
     anova(linmod1)
     summary(linmod1)
     abline(linmod1, lty="solid")

    R also gives us significance levels (SE, p-vals) and a bunch of other useful bits of information. And it does so with two lines of code. Not too shabby!

    For an example showing how the Least Square Method works, visit here.

    Did you enjoy this? Consider joining my on-line course “First steps in data analysis with R” and learn data analysis from zero to hero!