1. Descriptive Statistics: Frequency Distributions
Measures of Location I: Quantiles
Besides describing the characteristics of a distribution as a whole, descriptive statistics can also be used to provide more information about individual scores. One particularly useful piece of information is the location of a score relative to all other scores within the distribution.
Knowing a score's location relative to the other scores can, for instance, help you judge whether a particular score should be considered high, low, or average. Raw scores are, by themselves, not very informative in this regard.
One way to express a score's location within a distribution is to calculate its percentile rank.
#\phantom{0}#
Percentile Rank
The percentile rank of a score is the percentage of scores in the distribution that are equal to or lower than it.
A student scored #70# points at an exam and would like to know how well she did compared to her classmates. The scores of the entire class are as follows:
\[34\,\,\,\,42\,\,\,\,53\,\,\,\,56\,\,\,\,57\,\,\,\,60\,\,\,\,62\,\,\,\,64\,\,\,\,64\,\,\,\,67\,\,\,\,70\,\,\,\,70\,\,\,\,72\,\,\,\,78\,\,\,\,84\,\,\,\,89\]
To calculate the percentile rank of #X=70#, first count the number of scores that are equal to or lower than #70#, which in this case is #12#.
Next, divide that number by the total number of scores, which in this case is #16#, and multiply by #100\%#.
\[\cfrac{12}{16}\cdot 100\% = 75\%\]
So the percentile rank of #X=70# is #75#.
#\phantom{0}#
When a score is identified by its percentile rank, the score is called a percentile.
#\phantom{0}#
Percentiles
Definition
Percentiles are the values that divide a distribution of scores into one hundred equal parts.
The #P^{th}# percentile of distribution is the value such that #P# percent of the scores are equal to or below it.
Formula
The index of the #P^{th}# percentile of a distribution is:
\[i = \dfrac{P}{100}(n-1)+1\]
where #n# is the number of scores and #P# is a value between 1 and 99.
Percentile Calculation
The calculation of the #P^{th}# percentile starts by ordering the scores in the distribution from smallest to largest. Next, to find the index #i# of the #P^{th}# percentile, use the following formula:
\[i = \dfrac{P}{100}(n-1)+1\]
where #n# the total number of scores in the distribution.
It is important to note that formula above is used to determine the location of the #P^{th}# percentile and not the value associated with it.
If #i# is an integer, then the #P^{th}# percentile is the score located at the #i^{th}# position of the ordered data.
Whenever #i# is not an integer, linear interpolation is used to calculate the percentile:
- Find the two integers closest to #i# by rounding #i# up and down. These indices are denoted by #i_{above}# and #i_{below}#, respectively.
- Determine the values located at these positions. These values are denoted by #X_{above}# and #X_{below}#, respectively.
- Calculate the #P^{th}# percentile with the following formula:\[P^{th}\text{ percentile}=X_{below} + (i - i_{below}) \cdot (X_{above} - X_{below})\]
There are a number of different ways we can calculate the #20^{th}# percentile. Click on one of the panels to toggle a specific solution.
\[1,\,\,\,1,\,\,\,1,\,\,\,2,\,\,\,4,\,\,\,4,\,\,\,5,\,\,\,6,\,\,\,7,\,\,\,11,\,\,\,16,\,\,\,17,\,\,\,17,\,\,\,18,\,\,\,19,\,\,\,19,\,\,\,20,\,\,\,23,\,\,\,23,\,\,\,25,\,\,\,25\]
Next, to find the index #i# of the #20^{th}# percentile (#P=20#), use the following formula:
\[\begin{array}{rcl}
i &=& \cfrac{P}{100}(n-1)+1\\
&=& \cfrac{20}{100}(21 - 1) + 1=5
\end{array}\]
Since #i=5# is an integer, the #20^{th}# percentile is the score located at the #5^{th}# position of the ordered data:
\[P_{20}= X_{5} = 4\]
Assuming the sample scores are located in cells A1 through A21, the Excel command to calculate the #20^{th}# percentile is:PERCENTILE(array, k)
- array: The array or cell range of numeric values for which you want the percentile value.
- k: The percentile value in the range #[0, 1]#, inclusive.
\[= \text{PERCENTILE(A1:A21, 0.2)}\]
This gives:
\[P_{20} = 4\]
Thus, to calculate the #20^{th}# percentile, run the following command:quantile(x, probs)
- x: The numeric vector whose sample quantiles are wanted.
- probs: The numeric vector of probabilities with values in the range #[0, 1]#.
\[quantile(x = c(25,20,23,19,16,1,18,17,2,6,17,23,4,1,25,7,11,5,1,19,4), probs = 0.2)\]
This gives:
\[P_{20}= 4\]
#\phantom{0}#
Percentiles divide a distribution into #100# equal parts. It is possible, however, to divide a distribution into any arbitrary number of equal parts. When dividing a distribution of scores into equal parts, the dividing values are called quantiles.
#\phantom{0}#
Quantiles
If you divide the data set into #k# equal parts, you call the dividing values #k#-quantiles and there are always #k-1# quantiles.
#\phantom{0}#
If you divide a distribution of scores into four equal parts, the dividing values are referred to as quartiles.
#\phantom{0}#
Quartiles
Definition
Quartiles are the values that divide a distribution of scores into four equal parts.
The first (#Q_1#), second (#Q_2#), and third (#Q_3#) quartiles are equal to the 25th, 50th, and 75th percentile, respectively.
The second quartile is also called the median.
Formula
The index of the #Q^{th}# quartile of a distribution is:
\[i=\dfrac{Q}{4}(n-1)+1\]
where #n# is the number of scores and #Q# is a value between 1 and 3.
Quartile Calculation
The calculation of quartiles starts by ordering the scores in the distribution from smallest to largest. Next, to find the index #i# of the #Q^{th}# quartile, use the following formula:
\[i=\dfrac{Q}{4}(n-1)+1\]
where #n# is the total number of scores in the dataset.
It is important to note that formula above is used to determine the location of the #Q^{th}# quartile and not the value associated with it.
If #i# is an integer, then the #Q^{th}# quartile is the score located at the #i^{th}# position of the ordered data.
Whenever #i# is not an integer, linear interpolation is used to calculate the quartile:
- Find the two integers closest to #i# by rounding #i# up and down. These indices are denoted by #i_{above}# and #i_{below}#, respectively.
- Determine the values located at these positions. These values are denoted by #X_{above}# and #X_{below}#, respectively.
- Calculate the #Q^{th}# quartile with the following formula:\[Q^{th}\text{ quartile}=X_{below} + (i - i_{below}) \cdot (X_{above} - X_{below})\]
There are a number of different ways we can calculate the #3^{rd}# quartile. Click on one of the panels to toggle a specific solution.
\[1,\,\,\,1,\,\,\,2,\,\,\,4,\,\,\,4,\,\,\,5,\,\,\,6,\,\,\,7,\,\,\,11,\,\,\,17,\,\,\,19,\,\,\,23,\,\,\,25\]
Next, to find the index #i# of the #3^{rd}# quartile (#Q=3#), use the following formula:
\[\begin{array}{rcl}
i &=& \cfrac{Q}{4}(n-1)+1\\
&=& \cfrac{3}{4}(13 - 1) + 1=10
\end{array}\]
Since #i=10# is an integer, the #3^{rd}# quartile is the score located at the #10^{th}# position of the ordered data:
\[Q_{3}=X_{10} = 17\]
Assuming the sample scores are located in cells A1 through A13, the Excel command to calculate the #3^{rd}# quartile is:QUARTILE(array, quart)
- array: The array or cell range of numeric values for which you want the quartile value.
- quart: Indicates which quartile to return.
\[= \text{QUARTILE(A1:A13, 3)}\]
This gives:
\[Q_{3} = 17\]
Thus, to calculate the #3^{rd}# quartile, run the following command:quantile(x)
- x: The numeric vector whose sample quantiles are wanted.
\[quantile(x = c(2,6,17,23,4,1,25,7,11,5,1,19,4))\]
Looking at the output generated by R, under #75\%# we find:
\[Q_{3}= 17\]