Chi-Square Test for Independence: Test Statistic and p-value

10. Categorical Association: Chi-Square Test for Independence

Chi-Square Test for Independence: Test Statistic and p-value

Data for the Chi-Square Test for Independence

The observed frequency is the number of individuals in the sample that are classified as a particular category and is denoted by #f_o#.

The expected frequency is the number of individuals that one would expect to be classified as a particular category based on the predictions made by the null hypothesis and is denoted by #f_e#.

The expected frequency of a cell is calculated with the following formula:
\[f_e = \cfrac{f_r \cdot f_c}{n}\]

where #f_r# is frequency total for the row and #f_c# is the frequency total for the column.

Calculating Expected Frequencies

Consider the following frequency distribution table:

Observed Frequencies
	Apple	Banana	#\blue{\text{Total}}#
Extrovert	#\purple{\text{13}}#	#\purple{\text{37}}#	#\blue{\text{50}}#
Introvert	#\purple{\text{81}}#	#\purple{\text{97}}#	#\blue{\text{178}}#
#\orange{\text{Total}}#	#\orange{\text{94}}#	#\orange{\text{134}}#	228

To calculate the expected frequencies, apply the following formula to each #\purple{\text{cell}}# in the table:
\[f_e = \cfrac{\blue{f_r} \cdot \orange{f_c}}{n}\]

where #\blue{f_r}# is frequency total for the row and #\orange{f_c}# is the frequency total for the column.

#\begin{array}{llcl}
\,\,\,\,\scriptsize{\bullet}&\,\,\normalsize{\text{Extrovert - Apple}}&:&\cfrac{\blue{50}\cdot \orange{94}}{228}=20.61\\
\,\,\,\,\scriptsize{\bullet}&\,\,\normalsize{\text{Extrovert - Banana}}&:&\cfrac{\blue{50}\cdot \orange{134}}{228}=29.39\\
\,\,\,\,\scriptsize{\bullet}&\,\,\normalsize{\text{Introvert - Apple}}&:&\cfrac{\blue{178}\cdot \orange{94}}{228}=73.39\\
\,\,\,\,\scriptsize{\bullet}&\,\,\normalsize{\text{Introvert - Banana}}&:&\cfrac{\blue{178}\cdot \orange{134}}{228}=104.61\\
\end{array}#

Expected Frequencies
	Apple	Banana	Total
Extrovert	20.61	29.39	50
Introvert	73.39	104.61	178
Total	94	134	228

#\phantom{0}#
After the expected frequencies have been calculated, the next step is to calculate the Chi-Square Test for Independence test statistic in order to determine how much the observed frequencies differ from the frequencies expected under the null hypothesis.
#\phantom{0}#

Chi-Square Test Statistic and Distribution

The test statistic for the Chi-Square Test for Independence is denoted by #\chi^2# and is calculated with the following formula:

\[\chi^2=\sum_{\text{all cells}}{\dfrac{(\text{Observed}-\text{Expected})^2}{\text{Expected}}}=\sum_{\text{all cells}}{\dfrac{(f_o-f_e)^2}{f_e}}\]
Since the calculation of the test statistic involves adding squared values, a #\chi^2#-statistic will always have a value of zero or larger.

Assuming the null hypothesis of the Chi-Square Test for Independence is true, the #\chi^2#-statistic will (approximately) follow a #\chi^2#-distribution with #df = (r -1)(c-1)# degrees of freedom, where #r# is the number of rows and #c# the number of columns.

Chi-square distributions are positively skewed and the critical region will always entirely be located in the right tail of the distribution.

Calculating the p-value of a Chi-Square Test for Independence

A Chi-Square test is by definition a right-tailed test.

To calculate the #p#-value of a Chi-Square Test for Independence in Excel, use the following command:
\[=1\text{ - }\text{CHISQ.DIST}(\chi^2, df, 1)\]

To calculate the #p#-value of a Chi-Square Test for Independence in R, use the following command:
\[\text{pchisq}(\chi^2, df, lower.tail=\text{FALSE})\]

Where #df = (r \text{ - }1)(c\text{ - }1)#.

If #p \lt \alpha#, reject #H_0# and conclude #H_a#. Otherwise, do not reject #H_0#.

In an effort to assess the impact of funding cuts on pre-school programs, school administrators in a US school district selected a simple random sample of #164# students in the seventh grade and determined whether or not each student had attended pre-school and whether each student was performing below, at, or above grade level in mathematics.

The distribution was organized in the following two-way frequency table:

	Below grade level	At grade level	Above grade level	Total
Attended pre-school	11	36	20	67
No pre-school	38	39	20	97
Total	49	75	40	164

The researcher plans on using a Chi-Square Test for Independence to determine whether pre-school attendance and mathematical ability are related to one another.

Calculate the #p#-value of the test and make a decision regarding #H_0#. Round your answer to #3# decimal places. Use the #\alpha = 0.03# significance level.

#p=0.007#

On the basis of this #p#-value, #H_0# should be rejected, because #\,p# #\lt# #\alpha#.

There are a number of different ways we can calculate the #p#-value of the test. Click on one of the panels to toggle a specific solution.

Excel Calculation

Calculate the expected frequency of all cells in the table with the following formula:

\[f_e = \cfrac{f_r \cdot f_c}{n}\]

where #f_r# is frequency total for the row, #f_c# is the frequency total for the column, and #n# is the total sample size.

	Below grade level	At grade level	Above grade level	Total
Attended pre-school	20.018	30.64	16.341	67
No pre-school	28.982	44.36	23.659	97
Total	49	75	40	164

Calculate the #\chi^2#-statistic:
\[\begin{array}{rcl}
\chi^2&=&\sum\limits_{\text{all cells}}{\dfrac{(f_o-f_e)^2}{f_e}}\\
&=& \cfrac{(11-20.018)^2}{20.018} +\cfrac{(36-30.64)^2}{30.64} +\cfrac{(20-16.341)^2}{16.341} +\cfrac{(38-28.982)^2}{28.982} +\\&&\cfrac{(39-44.36)^2}{44.36} +\cfrac{(20-23.659)^2}{23.659}\\
&=& 9.839
\end{array}\]
Determine the degrees of freedom:
\[df = (r -1)(c-1) = (2 -1 )(3 - 1)=2\]
To calculate the #p#-value of a #\chi^2#-test, make use of the following Excel function:

CHISQ.DIST(x, deg_freedom, cumulative)

x: The value at which you wish to evaluate the distribution function.

deg_freedom: An integer indicating the number of degrees of freedom.

cumulative: A logical value that determines the form of the function.

TRUE - uses the cumulative distribution function, #\mathbb{P}(X \leq x)#

FALSE - uses the probability density function

A Chi-Square test is by definition a right-tailed test. Thus, to calculate the #p#-value of the test, run the following command:
\[=1\text{ - }\text{CHISQ.DIST}(\chi^2,(r \text{ - }1)(c\text{ - }1), 1)\\
\downarrow\\
=1\text{ - }\text{CHISQ.DIST}(9.839, 2, 1)\]
This gives:
\[p = 0.007\]
Since #\,p# #\lt# #\alpha#, the null hypothesis of independence should be rejected.

R Calculation

Calculate the expected frequency of all cells in the table with the following formula:

\[f_e = \cfrac{f_r \cdot f_c}{n}\]

where #f_r# is frequency total for the row, #f_c# is the frequency total for the column, and #n# is the total sample size.

	Below grade level	At grade level	Above grade level	Total
Attended pre-school	20.018	30.64	16.341	67
No pre-school	28.982	44.36	23.659	97
Total	49	75	40	164

pchisq(q, df, lower.tail)

q: The value at which you wish to evaluate the distribution function.

df: An integer indicating the number of degrees of freedom.

lower.tail: If TRUE (default), probabilities are #\mathbb{P}(X \leq x)#, otherwise, #\mathbb{P}(X \gt x)#.

A Chi-Square test is by definition a right-tailed test. Thus, to calculate the #p#-value of the test, run the following command:
\[\text{pchisq}(q = \chi^2, df = (r \text{ - }1)(c\text{ - }1), lower.tail=\text{FALSE})\\
\downarrow\\
\text{pchisq}(q = 9.839, df = 2, lower.tail=\text{FALSE})\]
This gives:
\[p = 0.007\]
Since #\,p# #\lt# #\alpha#, the null hypothesis of independence should be rejected.

New example