8. Testing for Differences in Means and Proportions: Practical 8
Testing for Differences Between Proportions
Two sample proportion test
Like with the hypothesis test on the means, also the proportions test can be used both to test whether the proportion is different from a situation under the null-hypothesis ánd to test whether the proportions between groups are different. The concepts are again similar and you can also use the same commands in both the one-sample and the two-sample test. Isn't that nice? However, there are a few points of attention when calculating the number of successes for the two-sample proportions test. Let's take a look at an example.
Test whether the proportion of PM10 observations exceeding the critical value of 50 differs between the Vondelpark station and the Stadhouderskade station. We provide you the hypothesis:
- #H_0#: #\pi_{vp} = \pi_{shk}#
- #H_a#: #\pi_{vp} \neq \pi_{shk}#
This time you will use a table with TRUE and FALSE counts as input in the prop.test()
function. To create this table, you first add a column "exceeds" to the PM10-dataframe. Recall that this is the dataframe with all PM10 observations, thus the combined observations from both stations. The "exceeds" column states for every observation whether or not it exceeds the threshold value of #50#.
PM10$exceeds <- PM10$value > 50
You can now create a table with the counts of observations exceeding the threshold, per location.
tab <- table(PM10$location, PM10$exceeds)
tab
Performing the two-sample proportion test is now easy: tab can be used directly as input in prop.test()
. Note that this means that you use the defaults for other inputs (two-sided and 95% confidence level).
prop.test(tab)
2-sample test for equality of proportions with continuity correction data: tab X-squared = 0.63801, df = 1, p-value = 0.4244 alternative hypothesis: two.sided 95 percent confidence interval: -0.013334148 0.005141108 sample estimates: prop 1 prop 2 0.9802260 0.9843225
However, if we look at the estimated proportions (the very last line of the output), these are very high and express the opposite (the proportion of non-exceedance days) of what we had in mind. For example, for prop 1 the proportion #1735/(1735+35)# (#= 0.980260#) is given instead of (what we had in mind) #35/(1735+35)#.
The function prop.test()
expects that the first column of the input table represents the number of successes and the second the number of failures. So we can get the desirable test by switching the columns in table tab:
tab2 <- tab[,c(2,1)] # reorder columns and 2
prop.test(tab2)
2-sample test for equality of proportions with continuity correction data: tab2 X-squared = 0.63801, df = 1, p-value = 0.4244 alternative hypothesis: two.sided 95 percent confidence interval: -0.005141108 0.013334148 sample estimates: prop prop 2 0.01977401 0.01567749
Notice that the #\chi#-squared, #df# and #p#-value are exactly the same as in the previous case with different ordering of the input-table. However, the proportions represent now the proportions of the observations that exceed the threshold. Looking at the p-value we see that #p > 0.05#.
The null-hypothesis is therefore not rejected and we have no reason to assume that the proportions are different at both locations.