Chi-Square Tests infographic - Goodness-of-Fit and Independence Tests

Click image to open full size

Chi-square tests are statistical tools used to compare observed data with what we would expect under a hypothesis. They are especially useful for categorical data, such as counts in groups or categories. These tests help scientists decide whether differences in counts are likely due to chance or reflect a real pattern. You will see chi-square methods in biology, psychology, public health, and social science.

The basic idea is to measure how far observed counts are from expected counts using the chi-square statistic. Larger differences produce a larger test statistic, which can indicate that the null hypothesis does not fit the data well. Common versions include the goodness-of-fit test, the test of independence, and the test of homogeneity. To use them correctly, students must understand expected counts, degrees of freedom, and the assumptions behind the test.

Key Facts

  • Chi-square test statistic: χ2=(OE)2E\chi^2 = \sum \frac{(O - E)^2}{E}
  • Goodness-of-fit expected count: E=n×pE = n \times p for each category
  • Independence expected count in a contingency table: E=row total×column totalgrand totalE = \frac{\text{row total} \times \text{column total}}{\text{grand total}}
  • Degrees of freedom for goodness-of-fit: df=k1df = k - 1, or df=k1mdf = k - 1 - m if mm parameters are estimated from data
  • Degrees of freedom for an r × c table: df = (r - 1)(c - 1)
  • A large chi-square value with a small p-value suggests rejecting the null hypothesis

Vocabulary

Observed count
The observed count is the actual number recorded in a category from the sample.
Expected count
The expected count is the number predicted in a category if the null hypothesis is true.
Degrees of freedom
Degrees of freedom tell how many category counts can vary independently once totals or constraints are fixed.
Null hypothesis
The null hypothesis is the claim that there is no difference, no association, or that the data follow a stated distribution.
Contingency table
A contingency table is a grid of counts showing how two categorical variables are distributed together.

Common Mistakes to Avoid

  • Using chi-square with percentages or means instead of counts, because the test is built for frequency data in categories rather than numerical averages.
  • Ignoring the expected count condition, because very small expected counts can make the chi-square approximation unreliable and lead to misleading p-values.
  • Confusing independence and homogeneity tests, because both use the same statistic but answer different questions about one population versus multiple populations.
  • Claiming causation from a significant chi-square result, because the test can show association or mismatch with a model but does not prove one variable causes another.

Practice Questions

  1. 1 A six-sided die is rolled 60 times. The observed counts are 8, 12, 9, 11, 10, and 10. Test at a basic level whether the die appears fair by finding the expected count for each face and computing χ2\chi^2.
  2. 2 A survey records favorite study method for 100 students: visual learners 30 prefer flashcards and 20 prefer notes, auditory learners 10 prefer flashcards and 40 prefer notes. Compute the expected count for each cell and then calculate the χ2\chi^2 statistic for a test of independence.
  3. 3 A chi-square test of independence gives p = 0.03 for the relationship between exercise level and sleep quality. Explain what this result means and what it does not mean.