1. Descriptive Statistics: Measures of Variability
Variance and Standard Deviation
The previous section introduced the sum of squares which serves as the basis for two of the most important measures of variability, namely the variance and standard deviation.
Variance
Definition
Variance is the average of the squared deviation scores.
The population and sample standard variance are denoted as #\sigma^2# and #s^2#, respectively.
The sample variance is calculated slightly differently than the population variance in order to make it an unbiased estimator*.
Formulas
\[\begin{array}{rcl}
\sigma^2 &=&\dfrac{SS}{N}= \dfrac{\sum{(X - \mu)^2}}{N}\\
\\
s^2&=&\dfrac{SS}{n-1}= \dfrac{\sum{(X - \bar{X})^2}}{n - 1}\\
\end{array}\]
Calculating the Sample Variance with Statistical Software
To calculate the sample variance in Excel, make use of the following function:
VAR(array)
- array: The array or cell range of numeric values for which you want to calculate the sample variance.
To calculate the sample variance in R, make use of the following function:
var(x)
- x: The numeric vector whose sample variance you wish to calculate.
#\phantom{x}#
One downside of the variance as a measure of variability is that it is not expressed in the same units as the original measurement. As a side effect of squaring the deviation scores, the units of measurement have also been squared. For example, if the original measurement is taken in meters, then the variance would produce a value expressed in square meters.
Squared measurement units make it difficult to compare the variance to other statistical measures that do have the same units as the original scores, such as the mean or the interquartile range. For this reason, the standard deviation is generally the preferred measure of variability.
#\phantom{x}#
Standard Deviation
Definition
The standard deviation is a measure of variability that is expressed in the same units as the original measurement. It is calculated by taking the square root of the variance.
The population and sample standard deviation are denoted as #\sigma# and #s#, respectively.
Formulas
\[\begin{array}{rcl}
\sigma &=& \sqrt{\sigma^2} = \sqrt{\dfrac{\sum{(X-\mu)^2}}{N}}
\\
s &=& \sqrt{s^2} = \sqrt{\dfrac{\sum{(X-\bar{X})^2}}{n-1}}
\end{array}\]
Calculating the Sample Standard Deviation with Statistical Software
To calculate the sample standard deviation in Excel, make use of the following function:
STDEV(array)
- array: The array or cell range of numeric values for which you want to calculate the sample standard deviation.
To calculate the sample standard deviation in R, make use of the following function:
sd(x)
- x: The numeric vector whose sample standard deviation you wish to calculate.
#\phantom{x}#
For frequency distributions that are approximately bell-shaped, a useful rule of thumb is that the majority of all scores are within one standard deviation on either side of the mean.