2. Association and Correlation: Correlation
Direction of a Linear Relationship: Covariance
To determine the direction of the linear relationship between two variables, calculate the covariance.
Covariance
Definition
The covariance measures the direction of the linear relationship between two quantitative variables.
The sample covariance between two variables and is denoted .
A positive covariance indicates that the variables have a positive linear relationship. A negative covariance indicates that the variables have a negative linear relationship.
Formulas
Computation of the Sample Covariance with Statistical Software
To compute the sample covariance between two variables and in Excel, make use of the following function:
COVARIANCE.S(x, y)
- x: The numeric vector that contains the values for variable
- y: The numeric vector that contains the values for variable
To compute the sample covariance between two variables and in R, make use of the following function:
cov(x, y)
- x: The numeric vector that contains the values for variable
- y: The numeric vector that contains the values for variable
To calculate the covariance between two variables and , multiply the deviation score with respect to by the deviation score with respect to for each case in the dataset.
If both and lie on the same side of their respective mean, then the resulting product will be positive, specifically:
- If both scores (,) lie their respective means then both deviation scores are but their product will be positive.
- If both scores (, ) lie their respective means then both deviation scores are and so is their product.
If the scores lie on opposite sides of their respective means, then one deviation score will be negative (,) and the other will be positive (,) and the resulting product will be negative.
These products are then averaged and the resulting measure is called the covariance.
Interpreting the sign of the covariance
The sign of the covariance indicates the direction of the linear relationship:
- If , then and are said to have a positive linear relationship.
- If , then and are said to have a negative linear relationship.
- If , then and are said to be linearly unrelated.
Interpreting the magnitude of the covariance
Although the sign of the covariance is a good measure of the direction of the linear relationship between two variables, the magnitude of the covariance is not a good measure of the strength of the relationship. This is because the magnitude of the covariance is heavily dependent on the magnitude of the variables.
Suppose we have a dataset containing the measurements of two variables and . Both of these variables were originally measured in meters. We calculate the covariance between these two variables and find a value of .
Now suppose we change our mind and decide we want to express the measurements of and in centimeters instead. To do so, we multiply all the values in the dataset by . We then recalculate the covariance and find a value of .
By multiplying each value in the dataset with a factor , the covariance increased by a factor . This illustrates why the covariance is a poor measure of the strength of the relationship between two variables. Multiplying or dividing all values in our dataset by some value should not affect our measurement of the strength of the relationship between variables.
Consider the following pairs of data points:
Calculate the sample covariance between and .
First calculate the means of variables and :
Now that the means are known, the values of , and can be calculated:
With this information, the sample covariance can be calculated: