9. Simple Linear Regression: Simple Linear Regression
Assessing the Quality of a Regression Model
The quality of a regression model is dependent on the accuracy of the predictions that the model makes. A good-fitting model will result in accurate predictions of the outcome variable, whereas a poor-fitting model will result in bad predictions.
In order to assess the predictive power of a regression model, we are going to divide the total variation in the outcome variable into two parts:
- The amount of variance in that can be explained by the regression model
- The amount of variance in that cannot be explained by the regression model
The greater the amount of the variance in the outcome variable we are able to explain with the help of the regression model, the greater the predictive power of the model will be.
Division of Variation
A key property of linear regression models is that they divide the total amount of variation in the outcome variable into two parts: the variation that can be explained by the regression model and that which cannot.
Three Sum of Squares
To get a measure of the total variation in the outcome variable , we calculate the total sum of squares, denoted . This measure represents all the variation in that could possibly be explained by our regression model and is calculated with the following formula:
To get a measure of the amount of variation in the outcome variable that can be explained by the regression model, we calculate the model sum of squares, denoted . To calculate this measure, use the following formula:
To get a measure of the amount of variation in the outcome variable that cannot be explained by the regression model, we calculate the residual sum of squares, denoted . To calculate this measure, use the following formula:
Once we have divided the total variation in the outcome variable into the part that can be explained by the regression model and that which cannot, we can calculate the coefficient of determination to get a single measure of the predictive power of the regression model.
Coefficient of Determination
The coefficient of determination, denoted , is the proportion of the total variation in the outcome variable that can be explained by the regression model.
There are two ways to calculate the coefficient of determination:
Interpreting the Coefficient of Determination
The coefficient of determination always takes on a value between and :
- An of indicates that the variation in the outcome variable cannot be explained whatsoever by the regression model.
- An of indicates that the variation in the outcome variable can be perfectly explained by the regression model.
- An between and indicates that some of the variation in the outcome variable can be explained by the regression model.
If we, for example, find a coefficient of , this means that of the total variation in the outcome variable can be explained by the regression model.