Formulas, Statistical Tables and R Commands: VVA Formula sheet
VVA Formula sheet VI (Regression)
Simple Linear Regression Equation
Performing a simple linear regression analysis results in a regression equation of the form:
\[\hat{Y}=b_0 + b_1 \cdot X\]
To calculate the slope coefficient #b_1# of the regression line, use of the following formula:
\[b_1 =\cfrac{\sum\limits_{i=1}^n{(X_i-\bar{X})(Y_i-\bar{Y})}}{\sum\limits_{i=1}^n{(X_i-\bar{X})^2}}=r_{\small{XY}}\bigg(\cfrac{s_{\small{Y}}}{s_{\small{X}}}\bigg)\]
Once the slope is known, it is possible to calculate the intercept coefficient #b_0# with the following formula:
\[b_0 = \bar{Y} - b_1 \cdot \bar{X}\]
#\phantom{0}#
Residuals
The residual of the #i^{th}# observation is calculated as follows:
\[e_i = Y_i - \hat{Y}_i\]
where #Y_i# is the observed value and #\hat{Y_i}# the predicted value for observation #i#.
#\phantom{0}#
Regression Sum of Squares
The total sum of squares, denoted #SS_{total}#, represents all the variation in #Y# that could possibly be explained by the regression model.
\[SS_{total}=\sum_{i=1}^{n} (Y_i - \bar{Y})^2\]
The model sum of squares, denoted #SS_{model}#, represents the amount of variation in the outcome variable #Y# that can be explained by the regression model.
\[SS_{model}=\sum_{i=1}^{n} (\hat{Y}_i - \bar{Y})^2\]
The residual sum of squares, denoted #SS_{model}#, represents the amount of variation in the outcome variable #Y# that cannot be explained by the regression model.
\[SS_{residual}=\sum_{i=1}^{n} (Y_i - \hat{Y}_i)^2=\sum_{i=1}^{n} (e_i)^2\]
#\phantom{0}#
Coefficient of Determination
The coefficient of determination, denoted #R^2#, is the proportion of the total variation in the outcome variable #Y# that can be explained by the regression model.
\[R^2=\cfrac{SS_{model}}{SS_{total}}\,\,\,\,\,\,\,\,\,\, \text{or} \,\,\,\,\,\,\,\,\,\, R^2=1-\cfrac{SS_{residual}}{SS_{total}} \]
#\phantom{0}#
Standard Error of the Estimate
The standard deviation of the errors is called the standard error of the estimate and is calculated as follows:
\[s_E = \sqrt{MS_{error}}=\sqrt{\cfrac{SS_{residual}}{n-2}} =\sqrt{ \cfrac{\displaystyle \sum_{i=1}^{n} (e_i)^2}{n-2}} \]
#\phantom{0}#
Standard Error of the Slope
The standard error of the slope #s_{b_1}# is a measure of the amount of error we can reasonably expect when using sample data to estimate the slope #\beta_1# of a linear regression model and is calculated with the following formula:
\[s_{b_1} = \cfrac{s_E}{\sqrt{SS_X}} = \cfrac{s_E}{ \sqrt{ \sum_{i=1}^{n} (X_i - \bar{X})^2 } } \]
where #s_E# is the standard error of the estimate.
#\phantom{0}#
Confidence Interval for the Slope of a Linear Model
The general formula for computing a #C\%\,CI# for the slope #\beta_1# is:
\[CI_{\beta_1}=\bigg(b_1 - t^*\cdot s_{b_1},\,\,\,\, b_1 + t^*\cdot s_{b_1} \bigg)\]
Where #t^*# is the critical value of the #t_{n-2}# distribution such that #\mathbb{P}(-t^* \leq t \leq t^*)=\frac{C}{100}#.
#\phantom{0}#
Hypothesis Test for the Slope of a Linear Model
The hypotheses of a two-sided test for the slope #\beta_1# of a linear model are:
\[\begin{array}{rcl}
H_0: \beta_1 = 0\\
H_a: \beta_1 \neq 0
\end{array}\]
The relevant test statistic for the null hypothesis #H_0: \beta_1 = 0# is:
\[t_{b_1} = \cfrac{b_1}{s_{b_1}}\]
Under the null hypothesis of the test, #t_{b_1}# follows a #t#-distribution with #df = n-2# degrees of freedom:
\[t_{b_1} \sim t_{n-2}\]