VVA Formula sheet VI (Regression)

Formulas, Statistical Tables and R Commands: VVA Formula sheet

VVA Formula sheet VI (Regression)

Simple Linear Regression Equation

Performing a simple linear regression analysis results in a regression equation of the form:

\[\hat{Y}=b_0 + b_1 \cdot X\]

To calculate the slope coefficient #b_1# of the regression line, use of the following formula:

\[b_1 =\cfrac{\sum\limits_{i=1}^n{(X_i-\bar{X})(Y_i-\bar{Y})}}{\sum\limits_{i=1}^n{(X_i-\bar{X})^2}}=r_{\small{XY}}\bigg(\cfrac{s_{\small{Y}}}{s_{\small{X}}}\bigg)\]

Once the slope is known, it is possible to calculate the intercept coefficient #b_0# with the following formula:

\[b_0 = \bar{Y} - b_1 \cdot \bar{X}\]

#\phantom{0}#

Residuals

The residual of the #i^{th}# observation is calculated as follows:

\[e_i = Y_i - \hat{Y}_i\]

where #Y_i# is the observed value and #\hat{Y_i}# the predicted value for observation #i#.

#\phantom{0}#

Regression Sum of Squares

The total sum of squares, denoted #SS_{total}#, represents all the variation in #Y# that could possibly be explained by the regression model.

\[SS_{total}=\sum_{i=1}^{n} (Y_i - \bar{Y})^2\]

The model sum of squares, denoted #SS_{model}#, represents the amount of variation in the outcome variable #Y# that can be explained by the regression model.

\[SS_{model}=\sum_{i=1}^{n} (\hat{Y}_i - \bar{Y})^2\]

The residual sum of squares, denoted #SS_{model}#, represents the amount of variation in the outcome variable #Y# that cannot be explained by the regression model.

\[SS_{residual}=\sum_{i=1}^{n} (Y_i - \hat{Y}_i)^2=\sum_{i=1}^{n} (e_i)^2\]

#\phantom{0}#

Coefficient of Determination

The coefficient of determination, denoted #R^2#, is the proportion of the total variation in the outcome variable #Y# that can be explained by the regression model.

\[R^2=\cfrac{SS_{model}}{SS_{total}}\,\,\,\,\,\,\,\,\,\, \text{or} \,\,\,\,\,\,\,\,\,\, R^2=1-\cfrac{SS_{residual}}{SS_{total}} \]

#\phantom{0}#

Standard Error of the Estimate

The standard deviation of the errors is called the standard error of the estimate and is calculated as follows:
\[s_E = \sqrt{MS_{error}}=\sqrt{\cfrac{SS_{residual}}{n-2}} =\sqrt{ \cfrac{\displaystyle \sum_{i=1}^{n} (e_i)^2}{n-2}} \]

#\phantom{0}#

Standard Error of the Slope

The standard error of the slope #s_{b_1}# is a measure of the amount of error we can reasonably expect when using sample data to estimate the slope #\beta_1# of a linear regression model and is calculated with the following formula:

\[s_{b_1} = \cfrac{s_E}{\sqrt{SS_X}} = \cfrac{s_E}{ \sqrt{ \sum_{i=1}^{n} (X_i - \bar{X})^2 } } \]

where #s_E# is the standard error of the estimate.

#\phantom{0}#

Confidence Interval for the Slope of a Linear Model

The general formula for computing a #C\%\,CI# for the slope #\beta_1# is:
\[CI_{\beta_1}=\bigg(b_1 - t^*\cdot s_{b_1},\,\,\,\, b_1 + t^*\cdot s_{b_1} \bigg)\]

Where #t^*# is the critical value of the #t_{n-2}# distribution such that #\mathbb{P}(-t^* \leq t \leq t^*)=\frac{C}{100}#.

#\phantom{0}#

Hypothesis Test for the Slope of a Linear Model

The hypotheses of a two-sided test for the slope #\beta_1# of a linear model are:

\[\begin{array}{rcl}
H_0: \beta_1 = 0\\
H_a: \beta_1 \neq 0
\end{array}\]
The relevant test statistic for the null hypothesis #H_0: \beta_1 = 0# is:

\[t_{b_1} = \cfrac{b_1}{s_{b_1}}\]
Under the null hypothesis of the test, #t_{b_1}# follows a #t#-distribution with #df = n-2# degrees of freedom:

\[t_{b_1} \sim t_{n-2}\]