Chi-square Goodness of Fit Test

10. Categorical Association: Practical 10

Chi-square Goodness of Fit Test

Objectives

Learn how to do the following in R

conduct and interpret a chi-square goodness of fit test by using the basic formula's
conduct and interpret a chi-square goodness of fit test by using chisq.test()

Instruction

Read through the text below
Execute code-examples and compare your results with what is explained in the text
Make the exercises and compare your answers with those in the examples
Time: 60 minutes

Basics of the Chi-square Test

A Chi-square Goodness of Fit Test can be used to investigate how well the sample data of a categorical variable fits the population proportions expected under the null hypothesis. The idea is to measure somehow the mismatch between the observed and expected frequency distributions and evaluate the size of the observed mismatch with mismatches you'd expect if the null hypothesis were true.

To give a concrete example, you can investigate whether the amount of children born is the same for each day of the week. Here, each day of the week is a different category. If you have gathered a sample of data you can see how many children are born each day of the week. These are your observed values. Under the null hypothesis, you would expect that birth is the same for each day of the week (so 1/7th). These are your expected values. In this case you thus investigate whether the population is equally divided among categories.

The Goodness of Fit Test

The Goodness of Fit Test is used for comparing a frequency distribution for a categorical variable with a theoretical distribution. Be aware: this is not the same as comparing the distribution of two observed categorical variables (this will follow in the next part of the practical) and also not the same as comparing numerical distributions.

The quantity that is calculated to measure the correspondence between the observed and theoretical frequency distributions is called the $X^2$ -statistic. It is calculated by the following formula:

$X^2=\sum_{\text{all categories}}{\dfrac{(\text{Observed}-\text{Expected})^2}{\text{Expected}}}$

The bigger the $X^2$ -statistic, the more dissimilar the observed frequency distribution is from the expected (=theoretical) distribution. Since the calculation of the test statistic involves adding squared values, a $X^2$ -statistic will always have a value of zero or larger.

If the sample is sufficiently large (at least an expected frequency of 5 in each category), the $X^2$ -statistic follows a so-called $\chi^2$ probability distribution with $df= k - 1$ degrees of freedom, where $k$ is the number of categories that are used in the frequency distribution. This distribution starts at 0 and is positively skewed. Note that having an expected frequency of at least 5 in each category is an assumption which should be checked.

Because we know the theoretical distribution of this $X^2$ -statistic, we can conduct a hypothesis test in the same way as other hypothesis tests that we know (e.g. for a mean, proportion or regression slope). The null and alternative hypotheses in this case are:

$H_0$ : The observed frequency distribution is not different from the theoretical distribution
$H_A$ : The observed and theoretical distributions are different.

And as mentioned above, the null hypothesis is rejected for high values of the $X^2$ -statistic. For that reason this hypothesis test is always a right-sided test.

To calculate the p-value for a given $X^2$ value, we can use the command pchisq(). Let's take a look at the documentation for pchisq().

?pchisq()

To calculate the p-value you need to specify a few input arguments.

x: the $X^2$ statistic that you calculated
df: the number of degrees of freedom (calculated by k-1)
lower.tail: this can be TRUE or FALSE, the default is true which means that you calculate the probability on the left side of the distribution of your statistic (from 0 to your statistic). If you specify FALSE you calculate the probability on the right side of the distribution of your statistic, to see what the probability is to find this statistic or more extreme, which is what we're interested in.

Imagine you expect that the answers to the statement 'Using less water makes me unhappy' (i.e. variable unhappy from the WD dataframe) are equally distributed over the $5$ categories ranging from 1= Strongly disagree to 5 = Strongly agree.

You test this by performing a $X^2$ -Goodness of Fit test. Is the observed frequency significantly different from the expected frequency at a significance level of $\alpha = 0.1$ ?

Round your answers for the $X^2$ -statistic and the $p$ -value to $2$ decimal points.

To answer this question you have to calculate the $X^2$ statistic and compare the $p$ -value for this statistic with the significance level.

Step 1: calulate $X^2$ -statistic
The $X^2$ -statistic is calculated using the following equation:

$X^2=\sum_{\text{all categories}}{\dfrac{(\text{Observed}-\text{Expected})^2}{\text{Expected}}}$

So: you need the observed and expected frequencies.

Step 1a: Calculate observed frequencies
The observed frequencies can be found in the frequency table of the variable unhappy.

observed <- table(WD$ unhappy)

Step 1b: Calculate expected frequencies
You expect that the answers are equally distributed over the $5$ classes (given the question). The expected frequency of each class is $1/5$ th of the total number of observations.

expected <- c(nrow(WD)/5, nrow(WD)/5, nrow(WD)/5, nrow(WD)/5, nrow(WD)/5)

Step 1c: Use the equation

X2 <- sum((observed-expected)^2/expected)

$X^2 = 80.53$

Now you know the $X^2$ -statistic, but is it large enough to say that the observed frequency is significantly different from the expected frequency?

Step 2: Calculate $p$ -value
Use the pchisq() function to calculate the $p$ -value. The degrees of freedom are calculated with the equation: $df = k - 1$ in which $k$ is the number of categories.

df <- 5 - 1
p <- pchisq(X2, df, lower.tail = FALSE)

$p$ -value = "1.34457431283586e-16"

Step 3: Conclusion
The $p$ -value is much smaller than the significance level of $0.1$ . Hence, the null hypothesis should be rejected. The responses of people to the statement 'Using less water makes me unhappy' are not equally distributed over the categories.

New example