2. Association and Correlation: Correlation
Monotonic Relationship: Spearman Correlation Coefficient
Spearman correlation coefficient
Definition
The Spearman correlation coefficient is the nonparametric version of the Pearson correlation. It is the linear correlation coefficient computed on the ranks of the data.
The Spearman correlation is used to determine the direction and strength of the monotonic relationship between two variables.
The sample Spearman correlation is denoted with the symbol .
Substitute and for their ranks
( and respectively)
No tied ranks
The Spearman correlation coefficient, or Spearman's rank correlation coefficient, is the nonparametric version of the Pearson correlation. The Spearman correlation is used when the variables of interest are measured on the ordinal scale or when there are clear signs that the assumptions of the Pearson correlation do not hold.
Spearman's correlation works by calculating the Pearson correlation between the ranks of the values in the dataset.
The two variables must be ranked separately (from low to high) by assigning a rank of 1 to the lowest value, 2 to the next lowest and so on. If there are ties (i.e. observations with equal values with the same ranking), midranks are assigned. The midrank is computed by taking the average of the ranks associated with the set of tied values.
To calculate the Spearman correlation, the general formula is that of the Pearson correlation using the ranked variables.
If there are no tied ranks, the following simplified formula can also be used:
where is the difference in paired ranks and is the number of pairs of scores.
The interpretation of Spearman's correlation coefficient is similar to Pearson's correlation coefficient:
- A value of indicates a perfect positive monotonic relationship between two variables.
- A value of indicates a perfect negative monotonic relationship between two variables.
- A value of indicates the variables are monotonic unrelated.0
Computation of Spearman Correlation coefficient in R
To compute the Spearman Correlation Coefficient between two variables and in Excel, you have to use two functions. First, use the RANK.AVG function to calculate the ranks of the original values in and , which are stored in the vectors and . Second, use the CORREL function to calculate the Spearman correlation of these ranks and .
RANK.AVG(xi, x, 1) and RANK.AVG(yi, y, 1)
CORREL(rgx, rgy)
- xi: The value of variable that has to be ranked
- x: The numeric vector that contains the values for variable
- yi: The value of variable that has to be ranked
- y: The numeric vector that contains the values for variable
- 1: This third argument sets the order in which the values should be ranked; the number 1 indicates that the values should be ranked in ascending order.
- rgx: The numeric vector that contains the ranks for the variable
- rgy: The numeric vector that contains the ranks for the variable
To compute the sample Spearman Correlation Coefficient between two variables and in R, make use of the cor() function. Specify that you want to calculate "spearman" correlation:
cor(x, y, method = "spearman")
- x: The numeric vector that contains the values for variable
- y: The numeric vector that contains the values for variable
- method: A string indicating which method you want to use (the default method is "pearson")