2. Association and Correlation: Practical 2
Pearson Correlcation Coefficient
The Pearson correlation coefficient is used to calculate the direction and strength of a linear relationship between two variables. The Pearson correlation coefficient is a 'scaled' version of the covariance, such that it ranges from #-1# to #+1#. The greater the absolute value of the coefficient, the stronger the relationship.
The scatterplot of two variables gives not only an indication about the form and direction of the relationship; with a little practice, it will also give you an idea about its strength.
Perfect positive linear relationship | |
Strong positive linear relationship | |
Moderate postive linear relationship | |
Weak positive linear relationship | |
No linear relationship |
Calculate Pearson's correlation coefficient in R
Of course, we can also calculate Pearson's correlation coefficient, instead of only estimating it's strength from the scatterplots. The equation is as follows:
#r(X,Y)=\dfrac{s_{X,Y}}{s_Xs_Y}#,
in which #r(X,Y)# is Pearson's correlation coefficient for the variables #X# and #Y#, #s_{X,Y}# is the covariance between #X# and #Y#, #s_X# is the standard deviation of #X# and #s_Y# is the standard deviation of #Y#. In R that would translate to
cor_xy <- cov(x,y)/(sd(x)*sd(y))
Alternatively, the correlation coefficient can be calculated by first applying a z-transformation to #X# and #Y# and then calculating the covariance. Recall that when we calculate the z-score, we subtract the mean and divide by the standard deviation. We should do that for each observation. Luckily, R has a function that does this for us: scale()
.
?scale
So #cor(X,Y) = cov(X_z,Y_z)#,
where #X_z = z#-#transformed# #X# and #Y_z = z#-#transformed# #Y# can be calculated in R with
cor_xy <- cov(scale(x), scale(y))
Lasty, and mosty conviniently, we can calculate the correlation coefficient directly with the cor()
function.
cor_xy <- cor(x, y)