10. Categorical Association: Practical 10
Keypoints
Keypoints
Introduction to cross tables and categorical association:
-
Cross tables are used as a first step to make inferences about categorical variables
- The distribution of one categorical variable is visualised with a bar plot and the association among categorical variables is visualised by a mosaic plot.
- A mosaic plot gives evidence of a relationship between two categorical variables
Chi-square Goodness of Fit Test:
- An observed frequency distribution can be compared with a theoretical distribution, and also be tested formally using the #X^2# statistic and #\chi^2# probability distribution.
- The null hypothesis for this test is that the observed frequency and theoretical distribution are the same. The alternative hypothesis is that they are different.
- This test is only accurate if the sample size is sufficiently large (relative to the number of categories) so that there is an expected frequency of least five in every category.
- The function
chisq.test()
conducts this test.
Chi-square Test for Association:
- The association between two categorical variables can be tested, via a cross table. Using the frequencies from the cross table the #X^2# statistic is calculated, which is subsequently used as input for a #\chi^2# probability distribution with #(r-1)(c-1)# degrees of freedom.
- The null hypothesis for this test for association is that the two variables are unrelated (=independent). The alternative hypothesis is that they are related (dependent).
- This test is only accurate if the sample size is sufficiently large (relative to the size of the cross table) so that here is an expected frequency of at least five in at least 80% of the cells.
Unlock full access