9. Simple Linear Regression: Simple Linear Regression
Inference about the Slope of a Linear Model
The linear relationship between two variables #X# and #Y# in the population can be expressed with the following equation:
\[Y_i = \beta_0 + \beta_1 \cdot X_i + \epsilon_i\]
In order to perform statistical inference about the slope #\beta_1# of a linear model, we will first need to determine the standard error of the slope.
#\phantom{0}#
Standard Error of the Slope
The standard error of the slope #s_{b_1}# is a measure of the amount of error we can reasonably expect when using sample data to estimate the slope #\beta_1# of a linear regression model and is calculated with the following formula:
\[s_{b_1} = \cfrac{s_E}{\sqrt{SS_X}} = \cfrac{s_E}{\displaystyle \sqrt{ \sum_{i=1}^{n} (X_i - \bar{X})^2 }}\]
where #s_E# is the standard error of the estimate.
#\phantom{0}#
We will now take a look at two ways we can use the standard error of the slope to perform statistical inference about the slope coefficient #\beta_1# of a linear regression model.
The calculation of #s_{b_1}# assumes that the errors #\epsilon_i# in the population are independent and normally distributed with a constant, but unknown variance (#s^2_\epsilon#). The variance #s^2_\epsilon# is estimated by #s^2_E#. Because two means are calculated from the data in estimating the variance of the intercept and slope parameters, we are left with #n-2# free-to-use data points on which the inference can be based. That is why inference for simple regression uses the #t#-distribution with #df=n-2# (#df# = degrees of freedom).
#\phantom{0}#
Confidence Interval for the Slope of a Linear Model
The general formula for computing a #C\%\,CI# for the slope #\beta_1# is:
\[CI_{\beta_1}=\bigg(b_1 - t^*\cdot s_{b_1},\,\,\,\, b_1 + t^*\cdot s_{b_1} \bigg)\]
Where #t^*# is the critical value of the #t_{n-2}# distribution such that #\mathbb{P}(-t^* \leq t \leq t^*)=\frac{C}{100}#.
Calculating t* with Statistical Software
Let #C# be the confidence level in #\%#.
To calculate the critical value #t^*# in Excel, make use of the function T.INV():
\[=\text{T.INV}((100+C)/200, n-2)\]
To calculate the critical value #t^*# in R, make use of the function qt():
\[\text{qt}(p=(100+C)/200, df=n-2,lower.tail = \text{TRUE})\]
#\phantom{0}#
We can also use the standard error of the slope to perform a hypothesis test for the value of the slope #\beta_1# of a linear regression model.
#\phantom{0}#
Hypothesis Test for the Slope of a Linear Model
The hypotheses of a two-sided test for the slope #\beta_1# of a linear model are:
\[\begin{array}{rcl}
H_0: \beta_1 = 0 & (\text{There is no linear relationship between }X \text{ and } Y \text{ in the population})\\
H_a: \beta_1 \neq 0 & (\text{There is a linear relationship between }X \text{ and } Y \text{ in the population})\,\,
\end{array}\]
The relevant test statistic for the null hypothesis #H_0: \beta_1 = 0# is:
\[t_{b_1} = \cfrac{b_1 - 0}{s_{b_1}} = \cfrac{b_1}{s_{b_1}}\]
Under the null hypothesis of the test, #t_{b_1}# follows a #t#-distribution with #df = n-2# degrees of freedom:
\[t_{b_1} \sim t_{n-2}\]
Calculating the p-value of a Hypothesis Test for the Slope of a Linear Model
The calculation of the #p#-value of a #t#-test for #\beta_1# is dependent on the direction of the test and can be performed using either Excel or R.
To calculate the #p#-value of a #t#-test for #\beta_1# in Excel, make use of one of the following commands:
\[\begin{array}{llll}
\phantom{0}\text{Direction}&\phantom{0000}H_0&\phantom{0000}H_a&\phantom{000000000}\text{Excel Command}\\
\hline
\text{Two-tailed}&H_0:\beta_1 = 0&H_a:\beta_1 \neq 0&=2 \text{ * }(1 \text{ - } \text{T.DIST}(\text{ABS}(t),n\text{ - }2,1))\\
\text{Left-tailed}&H_0:\beta_1 \geq 0&H_a:\beta_1 \lt 0&=\text{T.DIST}(t,n\text{ - }2,1)\\
\text{Right-tailed}&H_0:\beta_1 \leq 0&H_a:\beta_1 \gt 0&=1\text{ - }\text{T.DIST}(t,n\text{ - }2,1)\\
\end{array}\]
To calculate the #p#-value of a #t#-test for #\beta_1# in R, make use of one of the following commands:
\[\begin{array}{llll}
\phantom{0}\text{Direction}&\phantom{0000}H_0&\phantom{0000}H_a&\phantom{00000000000}\text{R Command}\\
\hline
\text{Two-tailed}&H_0:\beta_1 = 0&H_a:\beta_1 \neq 0&2 \text{ * }\text{pt}(\text{abs}(t),n\text{ - }2,lower.tail=\text{FALSE})\\
\text{Left-tailed}&H_0:\beta_1 \geq 0&H_a:\beta_1 \lt 0&\text{pt}(t,n\text{ - }2, lower.tail=\text{TRUE})\\
\text{Right-tailed}&H_0:\beta_1 \leq 0&H_a:\beta_1 \gt 0&\text{pt}(t,n\text{ - }2, lower.tail=\text{FALSE})\\
\end{array}\]
If #p \lt \alpha#, reject #H_0# and conclude #H_a#. Otherwise, do not reject #H_0#.