This cheat sheet covers one-way ANOVA and chi-square tests, two common methods for comparing data across groups or categories. Students use ANOVA when they need to compare several population means, not just two means. They use chi-square tests when data are counts in categories and the question is about fit or association. A compact reference helps students choose the correct test, organize the formulas, and avoid mixing up conditions. The main ANOVA formula is F=MSbetweenMSwithinF = \frac{MS_{\text{between}}}{MS_{\text{within}}}, which compares variation between groups to variation within groups. The main chi-square formula is χ2=(OE)2E\chi^2 = \sum \frac{(O-E)^2}{E}, which measures how far observed counts are from expected counts. Both test types use degrees of freedom and a pp-value to decide whether results are statistically significant. The key skill is matching the research question, data type, assumptions, and formula to the correct test.

Key Facts

  • One-way ANOVA tests whether several population means are equal using hypotheses H0:μ1=μ2==μkH_0: \mu_1 = \mu_2 = \cdots = \mu_k and Ha:H_a: at least one mean is different.
  • The ANOVA test statistic is F=MSbetweenMSwithinF = \frac{MS_{\text{between}}}{MS_{\text{within}}}, where large values of FF give stronger evidence against H0H_0.
  • The ANOVA sums of squares are SSbetween=ni(xˉixˉ)2SS_{\text{between}} = \sum n_i(\bar{x}_i-\bar{x})^2 and SSwithin=(xijxˉi)2SS_{\text{within}} = \sum\sum (x_{ij}-\bar{x}_i)^2.
  • For one-way ANOVA with kk groups and NN total observations, dfbetween=k1df_{\text{between}} = k-1, dfwithin=Nkdf_{\text{within}} = N-k, and dftotal=N1df_{\text{total}} = N-1.
  • Mean squares are found by MSbetween=SSbetweendfbetweenMS_{\text{between}} = \frac{SS_{\text{between}}}{df_{\text{between}}} and MSwithin=SSwithindfwithinMS_{\text{within}} = \frac{SS_{\text{within}}}{df_{\text{within}}}.
  • A chi-square goodness-of-fit test uses χ2=(OE)2E\chi^2 = \sum \frac{(O-E)^2}{E} to compare observed category counts to expected category counts.
  • A chi-square test of independence uses Eij=(row total)(column total)grand totalE_{ij} = \frac{(\text{row total})(\text{column total})}{\text{grand total}} for each table cell.
  • For a chi-square independence test with rr rows and cc columns, the degrees of freedom are df=(r1)(c1)df = (r-1)(c-1).

Vocabulary

One-way ANOVA
A statistical test that compares the means of kk independent groups to see whether at least one population mean differs.
F statistic
The ANOVA test statistic F=MSbetweenMSwithinF = \frac{MS_{\text{between}}}{MS_{\text{within}}} that compares between-group variation to within-group variation.
Chi-square statistic
The statistic χ2=(OE)2E\chi^2 = \sum \frac{(O-E)^2}{E} that measures how far observed counts are from expected counts.
Expected count
The count predicted for a category or table cell if the null hypothesis is true.
Degrees of freedom
The number of independent pieces of information used to find the reference distribution for a test statistic.
p-value
The probability, assuming H0H_0 is true, of getting a test statistic as extreme as or more extreme than the observed result.

Common Mistakes to Avoid

  • Using several two-sample tests instead of ANOVA: this increases the chance of a Type I error because each extra test adds another opportunity for a false positive.
  • Treating a significant ANOVA as proof that every group mean is different: ANOVA only shows that at least one mean differs, so follow-up comparisons are needed to identify which ones.
  • Using a chi-square test with percentages instead of counts: chi-square formulas require observed counts OO and expected counts EE, not proportions alone.
  • Forgetting to check expected counts: chi-square results can be unreliable when expected counts are too small, especially when many cells have E<5E<5.
  • Interpreting a large pp-value as proof that H0H_0 is true: a large pp-value means there is not enough evidence to reject H0H_0, not that the null hypothesis has been proven.

Practice Questions

  1. 1 For a one-way ANOVA with k=4k=4, N=36N=36, SSbetween=90SS_{\text{between}}=90, and SSwithin=210SS_{\text{within}}=210, find dfbetweendf_{\text{between}}, dfwithindf_{\text{within}}, MSbetweenMS_{\text{between}}, MSwithinMS_{\text{within}}, and FF.
  2. 2 A fair die is rolled 6060 times with observed counts (8,12,10,9,11,10)(8,12,10,9,11,10). Using E=10E=10 for each face, compute χ2=(OE)2E\chi^2 = \sum \frac{(O-E)^2}{E}.
  3. 3 In a 2×32 \times 3 table, the row totals are 4040 and 6060, the column totals are 3030, 5050, and 2020, and the grand total is 100100. Find the expected count for row 11, column 22, and find the degrees of freedom.
  4. 4 A student compares test scores from four teaching methods and gets a significant ANOVA result. Explain why this does not automatically show which teaching method is best.