Linear regression in general

Doing mathematics with R: Regression analysis in R

Linear regression in general

We have looked at the linear regression by a straight line in the case where measurement data are described as best as possible through a straight line. It is only important that the parameters to be estimated occur linearly in the formula.

For example, you may also approximate data via a formula such as $y= \beta_0+\beta_1\cdot \ln(x)$ or allow multiple parameters as in $y= \beta_0+\beta_1 x + \beta_2 x^2+\beta_3 x^3$ . The randomised examples below illustrate this.

y = a + b ln(x)

The R script below produces the diagram shown below.

ln <- function(x) {
  value <- log(x, base=exp(1))
  return(value)
}

# generate random data with noise
x <- seq(from=1, to=20, by=1)
a <- runif(1, min=1, max=10) 
b <- runif(1, min=0, max=5)
y <- a+ b*ln(x) + rnorm(20, mean=0, sd=1)

# linear regression
fit <- lm(formula = y ~ ln(x))

# visualisation of data + regression curve
plot(x, y, type = "p", pch = 16, col = "red", cex = 1.3,
     main = "regression curve y = a + b*ln(x)"))
lines(x, predict(fit), type="l", col="blue", lwd=2)

linear regression curve

y = a + b·x + c·x² + d·x³

The following R script generates the displayed plot. In this script, the function definition is a linear combination of the constant function $1$ , $x$ , $x^2$ en $x^3$ , The latter two dependencies are specified using an "as is" construction, denoted as I(x^2) en I(x^3), to ensure that arithmetic operations are correctly interpreted. Alternatively, we could have used poly(x, 3, raw=TRUE).

x <- seq(from=-2, to=2, by=0.2)
b0 <- runif(1, min=1, max=4)
b1 <- runif(1, min=-2, max=2)
b2 <- runif(1, min=1, max=2)
y <- b0 + b1*x + b2*x^2 + x^3 + rnorm(21, mean=0, sd=0.5)

# linear regression
fit <- lm(formula = y ~ 1 + x + I(x^2) + I(x^3))

# visualitation of data + regression curve
plot(x, y, type = "p", pch = 16, col = "red", cex = 1.3, 
     main = "regression curve y = a + b*x + c*x^2 + d*x^3")
lines(x, predict(fit), type="l", col="blue", lwd=2)

linear regression curve