### Formulas, Statistical Tables and R Commands: VVA Formula sheet

### VVA Formula sheet VI (Regression)

Simple Linear Regression Equation

Performing a *simple linear regression *analysis results in a **regression equation** of the form:

\[\hat{Y}=b_0 + b_1 \cdot X\]

To calculate the *slope coefficient* #b_1# of the regression line, use of the following formula:

\[b_1 =\cfrac{\sum\limits_{i=1}^n{(X_i-\bar{X})(Y_i-\bar{Y})}}{\sum\limits_{i=1}^n{(X_i-\bar{X})^2}}=r_{\small{XY}}\bigg(\cfrac{s_{\small{Y}}}{s_{\small{X}}}\bigg)\]

Once the slope is known, it is possible to calculate the *intercept coefficient *#b_0# with the following formula:

\[b_0 = \bar{Y} - b_1 \cdot \bar{X}\]

#\phantom{0}#

Residuals

The **residual** of the #i^{th}# observation is calculated as follows:

\[e_i = Y_i - \hat{Y}_i\]

where #Y_i# is the observed value and #\hat{Y_i}# the predicted value for observation #i#.

#\phantom{0}#

Regression Sum of Squares

The **total sum of squares**, denoted #SS_{total}#, represents all the variation in #Y# that could possibly be explained by the regression model.

\[SS_{total}=\sum_{i=1}^{n} (Y_i - \bar{Y})^2\]

The **model sum of squares**, denoted #SS_{model}#, represents the amount of variation in the outcome variable #Y# that *can be explained* by the regression model.

\[SS_{model}=\sum_{i=1}^{n} (\hat{Y}_i - \bar{Y})^2\]

The **residual sum of squares**, denoted #SS_{model}#, represents the amount of variation in the outcome variable #Y# that *cannot be explained* by the regression model.

\[SS_{residual}=\sum_{i=1}^{n} (Y_i - \hat{Y}_i)^2=\sum_{i=1}^{n} (e_i)^2\]

#\phantom{0}#

Coefficient of Determination

The **coefficient of determination**, denoted #R^2#, is the proportion of the total variation in the outcome variable #Y# that can be explained by the regression model.

\[R^2=\cfrac{SS_{model}}{SS_{total}}\,\,\,\,\,\,\,\,\,\, \text{or} \,\,\,\,\,\,\,\,\,\, R^2=1-\cfrac{SS_{residual}}{SS_{total}} \]

#\phantom{0}#

Standard Error of the Estimate

The standard deviation of the errors is called the **standard error of the estimate **and is calculated as follows:

\[s_E = \sqrt{MS_{error}}=\sqrt{\cfrac{SS_{residual}}{n-2}} =\sqrt{ \cfrac{\displaystyle \sum_{i=1}^{n} (e_i)^2}{n-2}} \]

#\phantom{0}#

Standard Error of the Slope

The **standard error of the slope** #s_{b_1}# is a measure of the amount of error we can reasonably expect when using sample data to estimate the slope #\beta_1# of a linear regression model and is calculated with the following formula:

\[s_{b_1} = \cfrac{s_E}{\sqrt{SS_X}} = \cfrac{s_E}{ \sqrt{ \sum_{i=1}^{n} (X_i - \bar{X})^2 } } \]

where #s_E# is the *standard error of the estimate*.

#\phantom{0}#

Confidence Interval for the Slope of a Linear Model

The general formula for computing a #C\%\,CI# for the slope #\beta_1# is:

\[CI_{\beta_1}=\bigg(b_1 - t^*\cdot s_{b_1},\,\,\,\, b_1 + t^*\cdot s_{b_1} \bigg)\]

Where #t^*# is the *critical value *of the #t_{n-2}# distribution such that #\mathbb{P}(-t^* \leq t \leq t^*)=\frac{C}{100}#.

#\phantom{0}#

Hypothesis Test for the Slope of a Linear Model

The *hypotheses* of a two-sided test for the slope #\beta_1# of a linear model are:

\[\begin{array}{rcl}

H_0: \beta_1 = 0\\

H_a: \beta_1 \neq 0

\end{array}\]

The relevant *test statistic *for the null hypothesis #H_0: \beta_1 = 0# is:

\[t_{b_1} = \cfrac{b_1}{s_{b_1}}\]

Under the null hypothesis of the test, #t_{b_1}# follows a #t#-distribution with #df = n-2# degrees of freedom:

\[t_{b_1} \sim t_{n-2}\]