Variance and Standard Deviation

1. Descriptive Statistics: Measures of Variability

Variance and Standard Deviation

The previous section introduced the sum of squares which serves as the basis for two of the most important measures of variability, namely the variance and standard deviation.

Variance

Definition

Variance is the average of the squared deviation scores.

The population and sample standard variance are denoted as $\sigma^2$ and $s^2$ , respectively.

The sample variance is calculated slightly differently than the population variance in order to make it an unbiased estimator*.

Formulas

$\begin{array}{rcl} \sigma^2 &=&\dfrac{SS}{N}= \dfrac{\sum{(X - \mu)^2}}{N}\\ \\ s^2&=&\dfrac{SS}{n-1}= \dfrac{\sum{(X - \bar{X})^2}}{n - 1}\\ \end{array}$

Bias when estimating

It turns out that, if one would calculate the sample variance using the same formula as the population variance, it would consistently underestimate the amount of variance in the population. Whenever a sample statistic consistently over- or underestimates the true value of the corresponding population parameter, that statistic is called a biased estimator.

Luckily, it's possible to calculate an unbiased estimator of the population variance by dividing the sum of squares by $n-1$ instead of by $n$ . A sample statistic is an unbiased estimator when it, on average, does not over- or underestimate its corresponding population parameter.

Calculating the Sample Variance with Statistical Software

To calculate the sample variance in Excel, make use of the following function:

VAR(array)

array: The array or cell range of numeric values for which you want to calculate the sample variance.

To calculate the sample variance in R, make use of the following function:

var(x)

x: The numeric vector whose sample variance you wish to calculate.

$\phantom{x}$
One downside of the variance as a measure of variability is that it is not expressed in the same units as the original measurement. As a side effect of squaring the deviation scores, the units of measurement have also been squared. For example, if the original measurement is taken in meters, then the variance would produce a value expressed in square meters.

Squared measurement units make it difficult to compare the variance to other statistical measures that do have the same units as the original scores, such as the mean or the interquartile range. For this reason, the standard deviation is generally the preferred measure of variability.
$\phantom{x}$

Standard Deviation

Definition

The standard deviation is a measure of variability that is expressed in the same units as the original measurement. It is calculated by taking the square root of the variance.

The population and sample standard deviation are denoted as $\sigma$ and $s$ , respectively.

Formulas

$\begin{array}{rcl} \sigma &=& \sqrt{\sigma^2} = \sqrt{\dfrac{\sum{(X-\mu)^2}}{N}} \\ s &=& \sqrt{s^2} = \sqrt{\dfrac{\sum{(X-\bar{X})^2}}{n-1}} \end{array}$

Calculating the Sample Standard Deviation with Statistical Software

To calculate the sample standard deviation in Excel, make use of the following function:

STDEV(array)

array: The array or cell range of numeric values for which you want to calculate the sample standard deviation.

To calculate the sample standard deviation in R, make use of the following function:

sd(x)

x: The numeric vector whose sample standard deviation you wish to calculate.

$\phantom{x}$
For frequency distributions that are approximately bell-shaped, a useful rule of thumb is that the majority of all scores are within one standard deviation on either side of the mean.