Chi-Square Goodness of Fit Test: Purpose, Hypotheses, and Assumptions

10. Categorical Association: Chi-Square Goodness of Fit Test

Chi-Square Goodness of Fit Test: Purpose, Hypotheses, and Assumptions

Chi-Square Test for Goodness of Fit: Purpose and Hypotheses

The Chi-Square Goodness of Fit Test uses the sample data of a categorical variable to test hypotheses about the proportions of a population distribution.

Specifically, the test determines how well the observed sample proportions fit the population proportions predicted by the null hypothesis.

The null hypothesis of a Goodness of Fit Test makes a prediction about the proportion (or percentage) of the population in each of the measurement categories. Although it is possible to choose any hypothesized proportions for the null hypothesis, the null hypothesis generally falls into one of two categories:

No preference

A null hypothesis of no preference is used to determine whether there are any preferences among the measurement categories, or whether the proportions differ from one category to the next.

In these cases, $H_0$ predicts that the population is divided equally across all categories.
$\phantom{0}$

For example, a null hypothesis stating that there is no preference among $4$ of the most popular ice cream flavors would predict the following population distribution:

	Chocolate	Vanilla	Strawberry	Banana
$H_0:$	$1/4$	$1/4$	$1/4$	$1/4$

$\phantom{0}$

No difference from a known population

Alternatively, we might want to determine whether the unknown proportions for one population significantly differ from the proportions for another population of which the distribution is already known.

A null hypothesis of no difference predicts that the proportions for the unknown population are identical to the proportions for the known population.
$\phantom{0}$

For example, suppose it is known that $22\%$ of students at the University of Amsterdam prefer morning classes, $65\%$ of students prefer to have their classes in the afternoon, and the remaining $13\%$ prefer evening classes. A professor at the Erasmus University in Rotterdam wonders whether these same proportions hold for her own students.

Here, the null hypothesis would state that the distribution of students at the Erasmus University is identical to the distribution of students at the University of Amsterdam:

	Preference for morning classes	Preference for afternoon classes	Preference for evening classes
$H_0:$	$.22$	$.65$	$.13$

Since the null hypothesis $H_0$ of a Goodness of Fit test makes an exact prediction about the distribution for the population, the alternative hypothesis $H_a$ simply predicts that the population has a different distribution from the one predicted by the null hypothesis.

Assumptions of the Chi-Square Goodness of Fit Test

The following assumptions are required to hold in order for a Chi-Square Goodness of Fit Test to produce valid results:

The variable being studied is categorical (qualitative) in nature.
The measurement categories are mutually exclusive, which means that each observation can be classified into one and only one category.
Random sampling is used to draw the sample.
All categories should have an expected frequency of at least $1$ .
The majority of categories $(\geq 80\%)$ should have an expected frequency of at least $5$ .

If either assumption $4$ or $5$ is not satisfied, you must combine some categories.