Doing mathematics with R: Regression analysis in R
Multiple linear regression
Multiple linear regression involves establishing a linear relationship \[y=\beta_0+\beta_1 x_1+\beta_2 x_2+\cdots + \beta_p x_p\] between a dependent variable \(y\) and two or more independent variables \(x_1, x_2, \cdots, x_p\). The parameters \(\beta_0, \beta_1, \ldots, \beta_p\) should be estimated such that the \(p\)-dimensional plane with equation \(y=\beta_0+\beta_1 x_1+\beta_2 x_2+\cdots + \beta_p x_p\) best fits measurement data. The example below shows how this can be done in R.
The R script below
x <- c(1,2,3,4,5)
y <- c(3,5,1,2,4)
z <- x + 2*y + rnorm(5, mean=0, sd=0.5)
# multiple linear regression
fit <- lm(formula = z ~ x + y)
print(summary(fit))
# correlation between data and data generate4d by linear fit
cat("correlation coefficient = ", cor(z, predict(fit)))
leads in the console window to the following output (up to randomisation):
Call:
lm(formula = y ~ x1 + x2)
Residuals:
1 2 3 4 5
-0.01798 0.08453 -0.46586 0.75004 -0.35073
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.6424 1.0005 -0.642 0.58658
x1 1.0420 0.2144 4.860 0.03982 *
x2 2.2229 0.2144 10.369 0.00917 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.6746 on 2 degrees of freedom
Multiple R-squared: 0.9839, Adjusted R-squared: 0.9678
F-statistic: 61.14 on 2 and 2 DF, p-value: 0.01609
correlation coefficient = 0.9962919
Unlock full access