10. Categorical Association: Chi-Square Test for Independence
Chi-Square Test for Independence: Test Statistic and p-value
Data for the Chi-Square Test for Independence
The observed frequency is the number of individuals in the sample that are classified as a particular category and is denoted by #f_o#.
The expected frequency is the number of individuals that one would expect to be classified as a particular category based on the predictions made by the null hypothesis and is denoted by #f_e#.
The expected frequency of a cell is calculated with the following formula:
\[f_e = \cfrac{f_r \cdot f_c}{n}\]
where #f_r# is frequency total for the row and #f_c# is the frequency total for the column.
Calculating Expected Frequencies
Consider the following frequency distribution table:
Observed Frequencies | |||
Apple | Banana | #\blue{\text{Total}}# | |
Extrovert | #\purple{\text{13}}# | #\purple{\text{37}}# | #\blue{\text{50}}# |
Introvert | #\purple{\text{81}}# | #\purple{\text{97}}# | #\blue{\text{178}}# |
#\orange{\text{Total}}# | #\orange{\text{94}}# | #\orange{\text{134}}# | 228 |
To calculate the expected frequencies, apply the following formula to each #\purple{\text{cell}}# in the table:
\[f_e = \cfrac{\blue{f_r} \cdot \orange{f_c}}{n}\]
where #\blue{f_r}# is frequency total for the row and #\orange{f_c}# is the frequency total for the column.
#\begin{array}{llcl}
\,\,\,\,\scriptsize{\bullet}&\,\,\normalsize{\text{Extrovert - Apple}}&:&\cfrac{\blue{50}\cdot \orange{94}}{228}=20.61\\
\,\,\,\,\scriptsize{\bullet}&\,\,\normalsize{\text{Extrovert - Banana}}&:&\cfrac{\blue{50}\cdot \orange{134}}{228}=29.39\\
\,\,\,\,\scriptsize{\bullet}&\,\,\normalsize{\text{Introvert - Apple}}&:&\cfrac{\blue{178}\cdot \orange{94}}{228}=73.39\\
\,\,\,\,\scriptsize{\bullet}&\,\,\normalsize{\text{Introvert - Banana}}&:&\cfrac{\blue{178}\cdot \orange{134}}{228}=104.61\\
\end{array}#
Expected Frequencies | |||
Apple | Banana | Total | |
Extrovert | 20.61 | 29.39 | 50 |
Introvert | 73.39 | 104.61 | 178 |
Total | 94 | 134 | 228 |
#\phantom{0}#
After the expected frequencies have been calculated, the next step is to calculate the Chi-Square Test for Independence test statistic in order to determine how much the observed frequencies differ from the frequencies expected under the null hypothesis.
#\phantom{0}#
Chi-Square Test Statistic and Distribution
The test statistic for the Chi-Square Test for Independence is denoted by #\chi^2# and is calculated with the following formula:
\[\chi^2=\sum_{\text{all cells}}{\dfrac{(\text{Observed}-\text{Expected})^2}{\text{Expected}}}=\sum_{\text{all cells}}{\dfrac{(f_o-f_e)^2}{f_e}}\]
Since the calculation of the test statistic involves adding squared values, a #\chi^2#-statistic will always have a value of zero or larger.
Assuming the null hypothesis of the Chi-Square Test for Independence is true, the #\chi^2#-statistic will (approximately) follow a #\chi^2#-distribution with #df = (r -1)(c-1)# degrees of freedom, where #r# is the number of rows and #c# the number of columns.
Chi-square distributions are positively skewed and the critical region will always entirely be located in the right tail of the distribution.
Calculating the p-value of a Chi-Square Test for Independence
A Chi-Square test is by definition a right-tailed test.
To calculate the #p#-value of a Chi-Square Test for Independence in Excel, use the following command:
\[=1\text{ - }\text{CHISQ.DIST}(\chi^2, df, 1)\]
To calculate the #p#-value of a Chi-Square Test for Independence in R, use the following command:
\[\text{pchisq}(\chi^2, df, lower.tail=\text{FALSE})\]
Where #df = (r \text{ - }1)(c\text{ - }1)#.
If #p \lt \alpha#, reject #H_0# and conclude #H_a#. Otherwise, do not reject #H_0#.
In an effort to assess the impact of funding cuts on pre-school programs, school administrators in a US school district selected a simple random sample of #183# students in the seventh grade and determined whether or not each student had attended pre-school and whether each student was performing below, at, or above grade level in mathematics.
The distribution was organized in the following two-way frequency table:
Below grade level | At grade level | Above grade level | Total | |
Attended pre-school | 25 | 38 | 26 | 89 |
No pre-school | 19 | 50 | 25 | 94 |
Total | 44 | 88 | 51 | 183 |
The researcher plans on using a Chi-Square Test for Independence to determine whether pre-school attendance and mathematical ability are related to one another.
Calculate the #p#-value of the test and make a decision regarding #H_0#. Round your answer to #3# decimal places. Use the #\alpha = 0.07# significance level.
#p=0.310#
On the basis of this #p#-value, #H_0# should not be rejected, because #\,p# #\gt# #\alpha#.
There are a number of different ways we can calculate the #p#-value of the test. Click on one of the panels to toggle a specific solution.
Calculate the expected frequency of all cells in the table with the following formula:
\[f_e = \cfrac{f_r \cdot f_c}{n}\]
where #f_r# is frequency total for the row, #f_c# is the frequency total for the column, and #n# is the total sample size.
Below grade level | At grade level | Above grade level | Total | |
Attended pre-school | 21.399 | 42.798 | 24.803 | 89 |
No pre-school | 22.601 | 45.202 | 26.197 | 94 |
Total | 44 | 88 | 51 | 183 |
Calculate the #\chi^2#-statistic:
\[\begin{array}{rcl}
\chi^2&=&\sum\limits_{\text{all cells}}{\dfrac{(f_o-f_e)^2}{f_e}}\\
&=& \cfrac{(25-21.399)^2}{21.399} +\cfrac{(38-42.798)^2}{42.798} +\cfrac{(26-24.803)^2}{24.803} +\cfrac{(19-22.601)^2}{22.601} +\\&&\cfrac{(50-45.202)^2}{45.202} +\cfrac{(25-26.197)^2}{26.197}\\
&=& 2.339
\end{array}\]
Determine the degrees of freedom:
\[df = (r -1)(c-1) = (2 -1 )(3 - 1)=2\]
To calculate the #p#-value of a #\chi^2#-test, make use of the following Excel function:
CHISQ.DIST(x, deg_freedom, cumulative)
- x: The value at which you wish to evaluate the distribution function.
- deg_freedom: An integer indicating the number of degrees of freedom.
- cumulative: A logical value that determines the form of the function.
- TRUE - uses the cumulative distribution function, #\mathbb{P}(X \leq x)#
- FALSE - uses the probability density function
A Chi-Square test is by definition a right-tailed test. Thus, to calculate the #p#-value of the test, run the following command:
\[=1\text{ - }\text{CHISQ.DIST}(\chi^2,(r \text{ - }1)(c\text{ - }1), 1)\\
\downarrow\\
=1\text{ - }\text{CHISQ.DIST}(2.339, 2, 1)\]
This gives:
\[p = 0.310\]
Since #\,p# #\gt# #\alpha#, the null hypothesis of independence should not be rejected.
Calculate the expected frequency of all cells in the table with the following formula:
\[f_e = \cfrac{f_r \cdot f_c}{n}\]
where #f_r# is frequency total for the row, #f_c# is the frequency total for the column, and #n# is the total sample size.
Below grade level | At grade level | Above grade level | Total | |
Attended pre-school | 21.399 | 42.798 | 24.803 | 89 |
No pre-school | 22.601 | 45.202 | 26.197 | 94 |
Total | 44 | 88 | 51 | 183 |
Calculate the #\chi^2#-statistic:
\[\begin{array}{rcl}
\chi^2&=&\sum\limits_{\text{all cells}}{\dfrac{(f_o-f_e)^2}{f_e}}\\
&=& \cfrac{(25-21.399)^2}{21.399} +\cfrac{(38-42.798)^2}{42.798} +\cfrac{(26-24.803)^2}{24.803} +\cfrac{(19-22.601)^2}{22.601} +\\&&\cfrac{(50-45.202)^2}{45.202} +\cfrac{(25-26.197)^2}{26.197}\\
&=& 2.339
\end{array}\]
Determine the degrees of freedom:
\[df = (r -1)(c-1) = (2 -1 )(3 - 1)=2\]
To calculate the #p#-value of a #\chi^2#-test, make use of the following R function:
pchisq(q, df, lower.tail)
- q: The value at which you wish to evaluate the distribution function.
- df: An integer indicating the number of degrees of freedom.
- lower.tail: If TRUE (default), probabilities are #\mathbb{P}(X \leq x)#, otherwise, #\mathbb{P}(X \gt x)#.
A Chi-Square test is by definition a right-tailed test. Thus, to calculate the #p#-value of the test, run the following command:
\[\text{pchisq}(q = \chi^2, df = (r \text{ - }1)(c\text{ - }1), lower.tail=\text{FALSE})\\
\downarrow\\
\text{pchisq}(q = 2.339, df = 2, lower.tail=\text{FALSE})\]
This gives:
\[p = 0.310\]
Since #\,p# #\gt# #\alpha#, the null hypothesis of independence should not be rejected.