Categorical Data & Chi-Square Lab

Test whether observed categorical data matches an expected distribution (goodness of fit) or whether two categorical variables are independent. Build contingency tables, compute expected counts, and interpret p-values from the chi-square statistic.

Guided Experiment: Testing a Fair Die

Hypothesis

Setup

Run Experiment

Analyze

Conclude

If a die is fair, each face should appear with equal probability (1/6). How can we test whether observed rolls deviate significantly from this expectation? What p-value would lead us to question the die's fairness?

Write your hypothesis in the Lab Report panel, then click Next.

Controls

Test Type

Significance Level (α)

Presets

Categories (6)

LabelObservedProportion

Results

Hypotheses

H_0:

The observed frequencies match the expected distribution.

H_a:

The observed frequencies do not match the expected distribution.

Chi-Square Statistic

\chi^2 = \sum \frac{(O_i - E_i)^2}{E_i} = 13.28

df

5

k - 1 = 5

p-value

0.0209

n

100

Reject H₀ at α = 0.05

There is significant evidence that the observed distribution differs from the expected distribution.

Observed vs Expected Counts

Category	Observed	Expected	Contribution
1	18	16.67	0.107
2	15	16.67	0.167
3	23	16.67	2.407
4	25	16.67	4.167
5	8	16.67	4.507
6	11	16.67	1.927

Largest Contributor

5 (4.5067)

Visualization

Observed vs Expected Counts

1

2

3

4

5

6

Observed Expected

Contribution to χ² (heatmap)

10.11

20.17

32.41

44.17

54.51

61.93

Data Table

(0 rows)

#	Trial	Test Type	χ² Statistic	df	p-value	Conclusion	Largest Contributor

Hypothesis

0 / 500

Observations

0 / 500

Conclusions

0 / 500

Reference Guide

The Chi-Square Statistic

The chi-square statistic measures how far observed counts deviate from expected counts across all categories.

\chi^2 = \sum \frac{(O_i - E_i)^2}{E_i}

Larger values indicate greater disagreement between observed and expected data. The statistic is always non-negative.

Goodness of Fit

Tests whether a single categorical variable follows a hypothesized distribution. Expected counts come from the hypothesized proportions.

E_i = n \cdot p_i, \quad df = k - 1

Where n is the total sample size, p_i is the expected proportion for category i, and k is the number of categories.

Test of Independence

Tests whether two categorical variables are associated. Expected counts are computed from row and column totals.

E_{ij} = \frac{(\text{row total}_i)(\text{col total}_j)}{n}, \quad df = (r-1)(c-1)

If the variables are truly independent, observed counts should be close to these expected counts.

Conditions & Interpretation

For the chi-square approximation to be valid, all expected counts should be at least 5. The p-value gives the probability of observing a chi-square statistic this extreme if the null hypothesis were true.

p\text{-value} < \alpha \implies \text{Reject } H_0

A small p-value (below the significance level) provides evidence against the null hypothesis.