10. Categorical Association: Practical 10
The Goodness of Fit Test on Repeat
The Chi-square Goodness of Fit Test requires quite a few calculation steps. Fortunately there is a function that can do all of these in one sweep: chisq.test()
.
To calculate the p-value you need to specify a few input arguments.
- x: a vector with the observed frequencies for each of the categories
- p: a vector with expected probabilities for each of the categories (or a vector with expected frequencies, but then rescale.p should be set to TRUE)
- rescale.p: this can be TRUE or FALSE, the default is FALSE which means that the vector with expected probabilities is not rescaled (if set to TRUE it is rescaled so that it sums to 1).
You test this by performing a #X^2#-Goodness of Fit test. Is the observed frequency significantly different from the expected frequency at a significance level of #\alpha = 0.05#?
Round your answers for the #X^2#-statistic and the #p#-value to #3# decimal points.
chisq.test()
, to conduct the hypothesis test.Step 1: Conduct test
Specify the following three input arguments to conduct the chi-squared goodness of fit test: x, p, and rescale.p.
Step 1a: calculate x
x should be a vector with the observed frequencies. You can find the observed frequencies in the frequency table of the variable wash.
observed <- table(WD$ wash)
Step 1b: calculate p
p should be a vector with the expected frequencies (NOTE: in this case make sure that rescale.p = TRUE). You expect that the answers are equally distributed over the #5# classes (given the question). The expected frequency of each class is #1/5#th of the total number of observations.
expected <- c(nrow(WD)/5, nrow(WD)/5, nrow(WD)/5, nrow(WD)/5, nrow(WD)/5)
Step 1c: Use the chisq.test() function
chisq.test(x=observed, p=expected, rescale.p=TRUE)
X-squared = 130.63, df = 4, p-value < 2.2e-16
Step 2: Conclusion
The #p#-value is much smaller than the significance level of #0.05#. Hence, the null hypothesis should be rejected. The responses of people to the statement 'I wash clothes after one wear' are not equally distributed over the categories.
NOTE: the p-value reported by this test has a minimum value beyond which it doesn't specify the result!
Alternatively, you can use the chisq.test() function with only two inputs. In that case the input argument p should be a vector of the expected probabilities for each of the categories. The argument rescale.p can then use it's default value of FALSE:
chisq.test(x=table(WD$ wash), p=c(0.2,0.2,0.2,0.2,0.2))
X-squared = 130.63, df = 4, p-value < 2.2e-16
Clearly, using the chisq.test()
function is a lot easier and less error-prone than conducting the separate calculation steps 'by hand'. And in any 'real-life' investigation you would therefore always use the function. However, by conducting these separate steps as an exercise you understand the idea behind the test-statistic and the (shape of) the Chi-square distribution better. So that's why we go over the 'manual calculations' as well.