Contingency Tables

3. Probability: Practical 3

Contingency Tables

To apply the probability formula's on real data we have to count quite a bit. And for that purpose, the command table() is quite handy. Until now we have only applied it to make a frequency distribution for a single variable. But it can also make counts over two variables and show the result in a contingency table (i.e. frequency table with 2 dimensions).

For example, to look at the distribution of the passenger class for the different sexes, you can create a contingency table for the two variables sex and pclass with the command:

table(titanic$sex,titanic$pclass)

Notice that the categories from the first argument in the table() command (titanic$sex) become the rows in the contingency table, and the categories from the second argument (titanic$pclass) become the columns.

If you would like the rows and columns switched, you can simply change the order of the input-arguments:

table(titanic$pclass,titanic$sex)

You can also add the variable names to the table to remember which is which and make the interpretation easier:

table(pclass = titanic$pclass, sex = titanic$sex)

How many of the female passengers survived?

$339$

To find the answer you use the contigency table of the two variables (sex and survived):

table(titanic$ sex, titanic$ survived)

This command results in the following table:

	0	1
female	127	339
male	682	161

which shows that $339$ of the female passengers survived.

New example

To turn a table with frequencies into a table with probabilities, we'd have to divide each cell in this table by the total number of passengers (i.e. the total number of observations: $1309$ ). This can be achieved via the following commands:

gender_pclass <- table(titanic$sex,titanic$pclass)
gender_pclass/sum(gender_pclass)

You can also use the command prop.table() to do the same in a slightly easier way.

prop.table( table(titanic$sex,titanic$pclass) )

What is the probability that a passenger traveled in the first class and was female? Round your solution to 3 decimal places.

$P($ first class $\cap$ female $) =$ $0.11$

To find the answer you use the proportion table of the two variables (pclass and sex). As you need to calculate the joint probability, you don't have to specify the margin argument:

prop.table(table(titanic$ pclass, titanic$ sex))

This command results in the following table:

	female	male
1	0.110008	0.136746
2	0.080978	0.130634
3	0.165012	0.376623

which shows that the probability is $0.11$ .

New example

prop.table() has an additional useful option: it can calculate proportions per row or per column. For this task a second input argument is used. For example, to calculate the distribution over the three classes within each gender, you should use:

prop.table( table(titanic$sex,titanic$pclass), margin=1 )

And to calculate the distribution over gender within each passenger class, the following command should be used:

prop.table( table(titanic$sex,titanic$pclass), margin=2 )

For the last two examples, you can see how for margin=1 the proportions per row sum to 1, and for margin=2 the proportions per column.

Note that if you would change the order of the input arguments (titanic$sex and titanic$pclass in the example above), then the numbers to be used for margin should also be changed to get the same result. In other words: the proportions in the following tables are the same (they only have rows and columns interchanged).

prop.table( table(titanic$pclass,titanic$sex), margin=1 )

prop.table( table(titanic$sex,titanic$pclass), margin=2 )

What is the probability that a passenger that died, traveled in the second class? Round your solution to 3 decimal places.

$P($ second class $|$ died $)=$ $0.195$

To find the answer you use the proportion table of the two variables (survived and pclass). In this exercise, you need to calculate the probability that a passenger traveled in the second class, given that the passenger died. This is a conditional probability, which means you need to specify the margin argument: you need to make sure that survived is summing to $1$ . If you put survived in the rows (first listed), you should specifiy margin = 1:

prop.table(table(titanic$ survived, titanic$ pclass), margin = 1)

This command results in the following table:

	1	2	3
0	0.152040	0.195303	0.652658
1	0.400000	0.238000	0.362000

which shows that the probability is $0.195$ .

New example