2. Association and Correlation: Practical 2
Pearson Correlcation Coefficient
The Pearson correlation coefficient is used to calculate the direction and strength of a linear relationship between two variables. The Pearson correlation coefficient is a 'scaled' version of the covariance, such that it ranges from to . The greater the absolute value of the coefficient, the stronger the relationship.
The scatterplot of two variables gives not only an indication about the form and direction of the relationship; with a little practice, it will also give you an idea about its strength.
Perfect positive linear relationship | ![]() |
Strong positive linear relationship | ![]() |
Moderate postive linear relationship | ![]() |
Weak positive linear relationship | ![]() |
No linear relationship |
![]() |
Calculate Pearson's correlation coefficient in R
Of course, we can also calculate Pearson's correlation coefficient, instead of only estimating it's strength from the scatterplots. The equation is as follows:
,
in which is Pearson's correlation coefficient for the variables and , is the covariance between and , is the standard deviation of and is the standard deviation of . In R that would translate to
cor_xy <- cov(x,y)/(sd(x)*sd(y))
Alternatively, the correlation coefficient can be calculated by first applying a z-transformation to and and then calculating the covariance. Recall that when we calculate the z-score, we subtract the mean and divide by the standard deviation. We should do that for each observation. Luckily, R has a function that does this for us: scale()
.
?scale
So ,
where - and - can be calculated in R with
cor_xy <- cov(scale(x), scale(y))
Lasty, and mosty conviniently, we can calculate the correlation coefficient directly with the cor()
function.
cor_xy <- cor(x, y)