3. Probability: Practical 3
Extention on Contingency Tables
Until now the command table()
was only applied to two variables, but we could apply it to more variables as well. Let's extend it to three:
table(titanic$survived, titanic$pclass, titanic$sex)
You can also turn the resulting ' three-way contingency table' into probabilities via prop.table()
, and in addition calculate conditional probabilities with the margin=
argument. You can apply table()
to three variables to answer the following question (but that is not the only way to do it).
What is the probability that a passenger that died and was a women, traveled in the first class? Round your solution to 3 decimal places.
#P(#first class | died #\cap# women #) =# #0.039#
To find the answer you use the proportion table of the three variables (survived, sex and pclass). You are sure that the passenger died and was a women and want to calculate the probability that the passenger traveled in the first class. Therefore, you calculate the probability conditional on both survived and sex (i.e. P(pclass | survived, sex)) and these variables should thus sum to 1. In the order of the variables below, this means you take the margin over the 1st and 2nd axis. Note the way the margin is specified with the
Alternatively, you could first make a subset of all passenger with the characteristics "died" and "was a women". By specifying these in the rows and selecting all columns, like:
To find the answer you use the proportion table of the three variables (survived, sex and pclass). You are sure that the passenger died and was a women and want to calculate the probability that the passenger traveled in the first class. Therefore, you calculate the probability conditional on both survived and sex (i.e. P(pclass | survived, sex)) and these variables should thus sum to 1. In the order of the variables below, this means you take the margin over the 1st and 2nd axis. Note the way the margin is specified with the
c()
command. prop.table(table(titanic$ survived, titanic$ sex, titanic$ pclass), margin = c(1,2))You find the answer in the resulting table in the cell that combines all three specified values (i.e. died, was a women, traveled in the first class), which shows that the probability is #0.039#.
Alternatively, you could first make a subset of all passenger with the characteristics "died" and "was a women". By specifying these in the rows and selecting all columns, like:
subset <- titanic[titanic$ survived == 0 & titanic$ sex == "female",]From this subset you can calculate the probability that a passenger traveled in the first class.
prop.table(table(subset$pclass))In the resulting table you can find the same answer.
Unlock full access