10. Categorical Association: Chi-Square Test for Independence
Chi-Square Test for Independence: Purpose, Hypotheses, and Assumptions
Chi-Square Test for Independence
The Chi-Square Test for Independence is used to determine whether there is a dependency (relationship) between two categorical variables in the population.
Two variables are said to be independent when the value obtained for one variable is not related to the value for the other variable.
The hypotheses of a Chi-Square Test for Independence are:
- #H_0:# The variables are independent.
- #H_a:# The variables are dependent.
The data for two categorical variables (e.g. eye color and gender) is typically displayed in a two-way frequency table:
Blue | Brown | Green | Other | Total | |
Men | 24 | 11 | 4 | 9 | 48 |
Women | 20 | 12 | 7 | 8 | 47 |
Total | 44 | 23 | 11 | 17 | 95 |
The following assumptions are required to hold in order for a Chi-Square Test for Independence to produce valid results:
- Both variables are categorical (qualitative) in nature.
- The measurement categories are mutually exclusive, which means that each observation can be classified into one and only one category.
- Random sampling is used to draw the sample.
- All cells should have an expected frequency of at least #1#.
- The majority of cells #(\geq 80\%)# should have an expected frequency of at least #5#.
If either assumption #4# or #5# is not satisfied, you must combine some categories.
Unlock full access