# The Least Square Method

One way of estimating the parameters for the linear regression model that best approximate the data is to use the Least Squares method (hereafter LSM), which looks for the set of parameters for which the total distance between the data and the regression line is lowest.

Finding the parameter estimates with the LSM can be done analytically, the formulae to do so (and how they are derived) can be found on most statistics handbooks. Here I want to show how the LSM works with a numerical implementation of it. It is a method that relies on computational power rather than on elegant formulae, but I think it is a good way to understand how a simple search algorithm works.

``````if (!require(devtools)) install.packages('devtools'); library(devtools)
if (!require(pacman)) install.packages('pacman'); library(pacman)

# generate data:
set.seed(222); x<- 0:100; y0 <- 1.45 + 2.98*x; y=y0+rnorm(n=101, sd=10)
df <- data.frame(x=x, y=y)
plot(y~x, data=df)
# the model parameters can be calculated using lm():
coef(lm(y~x, data=df))
# (Intercept)           x
#    1.837641    2.976166 ``````

To estimate intercept (hereafter: a) and slope (hereafter: b), we first need to define ranges of a and b where we think their true value lies.

``````a.vals <- seq(1, 2, by=0.01)
b.vals <- seq(2, 3, by=0.01)
# all possible combinations of a and b:
ab <- expand.grid(a.vals = a.vals,
b.vals = b.vals
)
# calculate the Estimated Sum of Squares for each parameter combination:
ab.SSE <- as.data.frame(ab)
ab.SSE\$SSE <- rep(NA, length.out = length(ab.SSE[,1]))
for (i in 1:length(ab.SSE[,1])){
predicted.y <- ab.SSE\$a.vals[i] + ab.SSE\$b.vals[i] * df\$x
ab.SSE\$SSE[i] <- sum((df\$y - predicted.y)^2)
}

ab.SSE\$a.vals[which(ab.SSE\$SSE == min(ab.SSE\$SSE))] # 1.65
ab.SSE\$b.vals[which(ab.SSE\$SSE == min(ab.SSE\$SSE))] # 2.98``````

Produce 3-D graphs:

``````pacman::p_load(akima) # this helps interpolating the values of a,b, and SSE to create a surface
x= ab.SSE\$a.vals
y= ab.SSE\$b.vals
z=ab.SSE\$SSE
s= akima::interp(x,y,z)

par(mfrow=c(1,3))
theta = c(50, 0, 90); phi = c(10, 0, 0)
for(i in 1:3){
persp(x = s\$x,
y = s\$y,
z = s\$z,
theta = theta[i], phi = phi[i],
xlab="a", ylab="b", zlab="SSE",
box=T, cex.lab=2
)
}``````

This search algorithm works as a proof of concept, but it is not very efficient. For it to find a good approximation of the parameters it needs to be “helped” by narrowing down the search space. We can do so by looking at the 3-D graphs above. They show that SSE is not affected much by changes in the intercept values, while the true value of slope is somewhere towards the hing end of the range we provided. We can adjust the parameters range accordingly and increase the resolution of our search:

``````a.vals <- seq(1.8, 1.85, by=0.0001)
b.vals <- seq(2.975, 2.985, by=0.0001)
# all possible combinations of a and b:
ab <- expand.grid(a.vals = a.vals,
b.vals = b.vals
)
# calculate the Estimated Sum of Squares for each parameter combination:
ab.SSE <- as.data.frame(ab)
ab.SSE\$SSE <- rep(NA, length.out = length(ab.SSE[,1]))
for (i in 1:length(ab.SSE[,1])){
predicted.y <- ab.SSE\$a.vals[i] + ab.SSE\$b.vals[i] * df\$x
ab.SSE\$SSE[i] <- sum((df\$y - predicted.y)^2)
}

ab.SSE\$a.vals[which(ab.SSE\$SSE == min(ab.SSE\$SSE))] # 1.8359
ab.SSE\$b.vals[which(ab.SSE\$SSE == min(ab.SSE\$SSE))] # 2.9762``````

The estimates thus obtained are close to those produced by lm(), which are 1.837641 and 2.976166, respectively.

More sophisticated search algorithms adjust their search effort dynamically based on how close they are to the best set of parameter estimates, thus reducing the time needed to find them.