All Labs

Categorical Data & Chi-Square Lab

Test whether observed categorical data matches an expected distribution (goodness of fit) or whether two categorical variables are independent. Build contingency tables, compute expected counts, and interpret p-values from the chi-square statistic.

Guided Experiment: Testing a Fair Die

If a die is fair, each face should appear with equal probability (1/6). How can we test whether observed rolls deviate significantly from this expectation? What p-value would lead us to question the die's fairness?

Write your hypothesis in the Lab Report panel, then click Next.

Controls

Categories (6)
LabelObservedProportion

Results

Hypotheses
H0:H_0: The observed frequencies match the expected distribution.
Ha:H_a: The observed frequencies do not match the expected distribution.
Chi-Square Statistic
χ2=(OiEi)2Ei=13.28\chi^2 = \sum \frac{(O_i - E_i)^2}{E_i} = 13.28
df
5
k1=5k - 1 = 5
p-value
0.0209
n
100
Reject H₀ at α = 0.05

There is significant evidence that the observed distribution differs from the expected distribution.

Observed vs Expected Counts
CategoryObservedExpectedContribution
11816.670.107
21516.670.167
32316.672.407
42516.674.167
5816.674.507
61116.671.927
Largest Contributor
5 (4.5067)

Visualization

Observed vs Expected Counts

1
2
3
4
5
6
Observed Expected

Contribution to χ² (heatmap)

10.11
20.17
32.41
44.17
54.51
61.93

Data Table

(0 rows)
#TrialTest Typeχ² Statisticdfp-valueConclusionLargest Contributor
0 / 500
0 / 500
0 / 500

Reference Guide

The Chi-Square Statistic

The chi-square statistic measures how far observed counts deviate from expected counts across all categories.

χ2=(OiEi)2Ei\chi^2 = \sum \frac{(O_i - E_i)^2}{E_i}

Larger values indicate greater disagreement between observed and expected data. The statistic is always non-negative.

Goodness of Fit

Tests whether a single categorical variable follows a hypothesized distribution. Expected counts come from the hypothesized proportions.

Ei=npi,df=k1E_i = n \cdot p_i, \quad df = k - 1

Where n is the total sample size, p_i is the expected proportion for category i, and k is the number of categories.

Test of Independence

Tests whether two categorical variables are associated. Expected counts are computed from row and column totals.

Eij=(row totali)(col totalj)n,df=(r1)(c1)E_{ij} = \frac{(\text{row total}_i)(\text{col total}_j)}{n}, \quad df = (r-1)(c-1)

If the variables are truly independent, observed counts should be close to these expected counts.

Conditions & Interpretation

For the chi-square approximation to be valid, all expected counts should be at least 5. The p-value gives the probability of observing a chi-square statistic this extreme if the null hypothesis were true.

p-value<α    Reject H0p\text{-value} < \alpha \implies \text{Reject } H_0

A small p-value (below the significance level) provides evidence against the null hypothesis.