Statistical Inference in Regression

9. Simple Linear Regression: Simple Linear Regression

Statistical Inference in Regression

Performing a simple linear regression analysis produces a regression model of the form:

$Y_i = b_0 + b_1 \cdot X_i + e_i$

where the outcome variable $Y$ is predicted from the predictor variable $X$ .

Performing such an analysis can be seen as a form of parameter estimation. Specifically, we are using the data collected from our sample to estimate two population parameters: the population intercept $\beta_0$ and the population slope $\beta_1$ . The relationship between the variables $X$ and $Y$ in the population can thus be expressed with the following equation:

$Y_i = \beta_0 + \beta_1 \cdot X_i + \epsilon_i$

Constructing a linear model of the relationship between two variables does not require us to make any strong assumptions about the population. The same cannot be said of performing statistical inference about the population parameters $\beta_0$ and $\beta_1$ , however.

In order to perform statistical inference, we will need some basic understanding of the probability distribution of the population being sampled, specifically the population of errors $\epsilon_i$ .
$\phantom{0}$

Assumptions for Linear Regression Models

The standard assumptions for constructing a valid linear regression model are that the errors are independently sampled from a normal distribution with a constant variance:

$\epsilon_i \sim N(0, \sigma^2_E)$

That is to say, the errors in the population $\epsilon_i$ are normally distributed with a mean of $0$ and some unknown error variance $\sigma^2_E$ .

Mean Square Error and the Standard Error of the Estimate

Although there is no way of determining the true value of the error variance $\sigma^2_E$ , we can get an estimate of this value by calculating the mean square error $MS_{error}$ :

$s^2_E = MS_{error}=\cfrac{SS_{residual}}{n-2} = \cfrac{\displaystyle \sum_{i=1}^{n} (e_i)^2}{n-2}$

The square root of the mean square error is essentially the standard deviation of the errors and is called the standard error of the estimate:

$s_E = \sqrt{MS_{error}}=\sqrt{\cfrac{SS_{residual}}{n-2}} =\sqrt{ \cfrac{\displaystyle \sum_{i=1}^{n} (e_i)^2}{n-2}}$

The standard error of the estimate thus provides insight into the amount of error we can reasonably expect when predicting $Y$ from $X$ .