Formulas, Statistical Tables and R Commands: Formulas
Formulas descriptive statistics
Mean
For #n# observed values #x# of variable #X#, the mean equals
\begin{equation*} \overline{x}=\frac{ \sum_{i=1}^n\limits x_i}{n}
\end{equation*}
Mean of a frequency distribution
For #n# observed values #x# of variable #X#, with #k# different outcomes with frequency #f#, the mean equals
\begin{equation*} \overline{x}=\frac{ \sum_{i=1}^k\limits f_i x_i}{n}
\end{equation*} For a dichotomous (binary) variable #X# with different outcomes #x=0# and #x=1#, the mean equals the proportion of outcomes #x=1#, referred to as #p_x#.
Median
The median is the middle observed value of all ordered observations. The median corresponds to the #50#th percentile, #P_{50}# (see `Percentiles' below).
Mode
The modus is the most frequent observed value.
Standard deviation
The standard deviation (as estimator for the population value #\sigma#) is
\begin{equation*} s_x = \sqrt{\frac{\sum_{i=1}^n (x_i-\overline{x})^2}{n-1}}.
\end{equation*}
The standard deviation population value for a dichotomous (binary) variable is
\begin{equation*} \sigma_x = \sqrt{p_x(1-p_x)}.\end{equation*}
Variance
The variance (as estimator for the population value #\sigma^2#) is
\begin{equation*} s_x^2 = \displaystyle\frac{\sum_{i=1}^n (x_i-\overline{x})^2}{n-1}.\end{equation*}
The variance population value for a dichotomous (binary) variable is
\begin{equation*} \sigma_x^2 = p_x(1-p_x).
\end{equation*}
Percentiles
The #p#th percentile is the value for which #p# percent of observations is smaller or equal. For example, #50#th percentile is the value for which holds that half of all observations are smaller or equal. This is referred to as #P_{50}# (which is equivalent to the median).
Interquartile
The interquartile distance is \begin{equation*} IQR=Q_3-Q_1, \end{equation*} where #Q_3# corresponds to #P_{75}# and #Q_1# corresponds to #P_{25}#.
Range
The range indicates within which distance from each other with all observed values are located. It is calculated by \begin{equation*} range = maximum - minimum. \end{equation*}
Z-score
The z-score, or standardized score \begin{equation*}
z_{x_i}=\frac{x_i-\overline{x}}{s_x}.
\end{equation*} (This is a linear transformation with #a=-\overline{x}/s_x# and #b=1/s_x#, see `Linear transformation' below).
Linear transformation
For a linear transformation #y_i=a+bx_i# the following holds
\begin{equation*} \overline{y} = a+b\cdot \overline{x}
\end{equation*} and \begin{eqnarray*} s_y^2 & = & b^2\cdot s_x^2 \\
s_y & = & b\cdot s_x. \end{eqnarray*}
Covariance
The covariance between #x# and #y#
\begin{equation*} s_{xy}=\frac{1}{n-1}\sum_{i=1}^n\limits
(x_i-\overline{x})(y_i-\overline{y}).
\end{equation*} The following rules apply with respect to the variance and covariance: \begin{eqnarray*} s_{xx} & = &
s_x^2 \\
s_{x+y}^2 & = & s_x^2+s_y^2+2s_{xy} \\
s_{x-y}^2 & = & s_x^2+s_y^2-2s_{xy}.
\end{eqnarray*}
For two dichotomous (binary) variables #X# and #Y#, where #p_{xy}# equals the probability of a score of 1 for both #X# and #Y#, the covariance population value equals
\begin{equation*} \sigma_{xy}=p_{xy}-p_xp_y.
\end{equation*}
Pearson's (product-moment) correlation coefficient
The correlation between #x# and #y#
\begin{eqnarray*}
r_{xy}& =& \frac{s_{xy}}{s_xs_y}\\
& =& \frac{1}{n-1}\sum_{i=1}^n\limits z_{x_i}z_{y_i}\\
& = & \frac{1}{n-1}\sum_{i=1}^n\limits
\left(\frac{x_i-\overline{x}}{s_x}\right)\left(\frac{y_i-\overline{y}}{s_y}\right).
\end{eqnarray*}