1. Descriptive Statistics: Measures of Central Tendency
Sensitivity to Outliers
An important consideration when comparing the mode, median, and mean as measures of centrality is their sensitivity to outliers.
#\phantom{0}#
An outlier is an exceptionally high or low score that does not conform to the pattern observed for the majority of the data.
Outliers can be the result of a measurement error or a mistake made during the entry of the data, but more often the score is a legitimate exceptional case.
#\phantom{0}#
A statistical measure is said to be sensitive to outliers when it is influenced by the presence of outliers in the dataset.
#\phantom{0}#
Both the mode and the median are measures of centrality which are not sensitive to the presence of outliers in the dataset. The mean, on the other hand, is very sensitive to the presence of outliers.
The following example helps illustrate the concept of sensitivity to outliers in the context of central tendency.
#\phantom{0}#
Measures of Centrality and Sensitivity
#\phantom{000000000000}# Dataset
Consider the following set of #n=13# scores:
#1,\, 1,\, 2,\, 4,\, 5,\, 5,\, 6,\, 8,\, 8,\, 8,\, 9,\, 10,\, 11#
Measures of Centrality
- Mode #= 8#
- Median #= 6#
- Mean #= \dfrac{78}{13} = 6#
#\phantom{0}#
Now, consider what happens when the score #X = 11# is changed into an outlier, such as #X=76#.
#\phantom{0}#
#\phantom{000000000000}# Dataset
The new set of #n=13# scores is:
#1,\, 1,\, 2,\, 4,\, 5,\, 5,\, 6,\, 8,\, 8,\, 8,\, 9,\, 10,\, \boldsymbol{76}#
Measures of Centrality
- Mode #= 8#
- Median #= 6#
- Mean #= \dfrac{143}{13} = 11#
#\phantom{0}#
Both the mode and the median remain unchanged as the result of changing a score into an outlier:
- The most frequently occurring value in the dataset stays the same, namely #X=8#.
- Likewise, the middlemost score also stays the same, namely #X_7=6#.
This demonstrates that both the mode and the median are examples of measures that insensitive to the presence of outliers in the dataset.
The mean, however, changes quite a lot as a result of the outlier. Since every score in the dataset contributes equally to the mean, a single high or low score can have a drastic effect on the mean, especially if the dataset is relatively small.