10. Categorical Association: Chi-Square Test for Independence
Chi-Square Test for Independence: Test Statistic and p-value
Data for the Chi-Square Test for Independence
The observed frequency is the number of individuals in the sample that are classified as a particular category and is denoted by .
The expected frequency is the number of individuals that one would expect to be classified as a particular category based on the predictions made by the null hypothesis and is denoted by .
The expected frequency of a cell is calculated with the following formula:
where is frequency total for the row and is the frequency total for the column.
Calculating Expected Frequencies
Consider the following frequency distribution table:
Observed Frequencies | |||
Apple | Banana | ||
Extrovert | |||
Introvert | |||
228 |
To calculate the expected frequencies, apply the following formula to each in the table:
where is frequency total for the row and is the frequency total for the column.
Expected Frequencies | |||
Apple | Banana | Total | |
Extrovert | 20.61 | 29.39 | 50 |
Introvert | 73.39 | 104.61 | 178 |
Total | 94 | 134 | 228 |
After the expected frequencies have been calculated, the next step is to calculate the Chi-Square Test for Independence test statistic in order to determine how much the observed frequencies differ from the frequencies expected under the null hypothesis.
Chi-Square Test Statistic and Distribution
The test statistic for the Chi-Square Test for Independence is denoted by and is calculated with the following formula:
Since the calculation of the test statistic involves adding squared values, a -statistic will always have a value of zero or larger.
Assuming the null hypothesis of the Chi-Square Test for Independence is true, the -statistic will (approximately) follow a -distribution with degrees of freedom, where is the number of rows and the number of columns.
Chi-square distributions are positively skewed and the critical region will always entirely be located in the right tail of the distribution.
Calculating the p-value of a Chi-Square Test for Independence
A Chi-Square test is by definition a right-tailed test.
To calculate the -value of a Chi-Square Test for Independence in Excel, use the following command:
To calculate the -value of a Chi-Square Test for Independence in R, use the following command:
Where .
If , reject and conclude . Otherwise, do not reject .
In an effort to assess the impact of funding cuts on pre-school programs, school administrators in a US school district selected a simple random sample of students in the seventh grade and determined whether or not each student had attended pre-school and whether each student was performing below, at, or above grade level in mathematics.
The distribution was organized in the following two-way frequency table:
Below grade level | At grade level | Above grade level | Total | |
Attended pre-school | 13 | 45 | 26 | 84 |
No pre-school | 20 | 31 | 17 | 68 |
Total | 33 | 76 | 43 | 152 |
The researcher plans on using a Chi-Square Test for Independence to determine whether pre-school attendance and mathematical ability are related to one another.
Calculate the -value of the test and make a decision regarding . Round your answer to decimal places. Use the significance level.
On the basis of this -value, should not be rejected, because .
There are a number of different ways we can calculate the -value of the test. Click on one of the panels to toggle a specific solution.
Calculate the expected frequency of all cells in the table with the following formula:
where is frequency total for the row, is the frequency total for the column, and is the total sample size.
Below grade level | At grade level | Above grade level | Total | |
Attended pre-school | 18.237 | 42.0 | 23.763 | 84 |
No pre-school | 14.763 | 34.0 | 19.237 | 68 |
Total | 33 | 76 | 43 | 152 |
Calculate the -statistic:
Determine the degrees of freedom:
To calculate the -value of a -test, make use of the following Excel function:
CHISQ.DIST(x, deg_freedom, cumulative)
- x: The value at which you wish to evaluate the distribution function.
- deg_freedom: An integer indicating the number of degrees of freedom.
- cumulative: A logical value that determines the form of the function.
- TRUE - uses the cumulative distribution function,
- FALSE - uses the probability density function
A Chi-Square test is by definition a right-tailed test. Thus, to calculate the -value of the test, run the following command:
This gives:
Since , the null hypothesis of independence should not be rejected.
Calculate the expected frequency of all cells in the table with the following formula:
where is frequency total for the row, is the frequency total for the column, and is the total sample size.
Below grade level | At grade level | Above grade level | Total | |
Attended pre-school | 18.237 | 42.0 | 23.763 | 84 |
No pre-school | 14.763 | 34.0 | 19.237 | 68 |
Total | 33 | 76 | 43 | 152 |
Calculate the -statistic:
Determine the degrees of freedom:
To calculate the -value of a -test, make use of the following R function:
pchisq(q, df, lower.tail)
- q: The value at which you wish to evaluate the distribution function.
- df: An integer indicating the number of degrees of freedom.
- lower.tail: If TRUE (default), probabilities are , otherwise, .
A Chi-Square test is by definition a right-tailed test. Thus, to calculate the -value of the test, run the following command:
This gives:
Since , the null hypothesis of independence should not be rejected.