2. Association and Correlation: Correlation
Displaying the Relationship Between Two Variables
Contingency table
To inspect the relationship between two categorical/qualitative variables, we can construct a contingency table (also called a cross tabulation).
A contingency table is very similar to a frequency table. The difference is that a frequency table only concerns a single variable while a contingency table concerns two or more variables.
A contingency table displays the distribution of one variable in the rows and the distribution of a second variable in the columns of the table.
Below is a contingency table displaying the relationship between gender and pet preference:
Prefers dogs | Prefers cats | Total | |
Male | 25 | 15 | 40 |
Female | 20 | 40 | 60 |
Total | 45 | 55 | 100 |
Because the rows and columns contain a different number of cases, the relationship between the two variables is not immediately obvious. To get a better understanding of the relationship, we can convert the absolute frequencies in the table into either column or row percentages:
- To calculate the column percentage of a cell, we divide the absolute frequency in the cell by the corresponding column total
- To calculate the row percentage of a cell, we divide the absolute frequency in the cell by the corresponding row total
The table below is the result of converting the absolute frequencies into row percentages:
Prefers dogs | Prefers cats | Total | |
Male | 62.5% | 37.5% | 100% |
Female | 33.3% | 66.7% | 100% |
Total |
This data suggest that there is a relationship between gender and pet preference.
Specifically, men tend to have a preference for dogs over cats (62.5% vs 37.5%), whereas women tend to have a preference for cats (33.3% vs 66.7%).
Scatterplot
To visually inspect the relationship between two numerical/quantitative variables, we can construct a scatterplot.
A scatterplot is an - graph, with one variable plotted along each axis. Pairs of scores that correspond to a single individual are plotted as dots.
To get an impression of the relationship between the two variables, we can draw a 'cloud' around the dots in the scatterplot.
In this case, the cloud has the shape of an ellipse pointing from the bottom-left to the top-right, indicating a positive linear relationship.
This suggests that students who study longer also tend to be students that get better grades.