Choosing the Right Statistical Test Cheat Sheet

Choosing the right statistical test helps students match a research question to the correct method of analysis. This cheat sheet focuses on deciding whether data involve means, proportions, counts, categories, or relationships. It is useful when planning an investigation, checking assumptions, or interpreting results.

Students can use it as a quick reference before performing calculations or using technology.

The main ideas are to identify the response variable, the explanatory variable, the number of groups, and whether samples are independent or paired. Mean-based tests often use test statistics such as $t=\frac{\bar{x}-\mu_0}{s/\sqrt{n}}$ or compare several means with ANOVA. Category-based tests often use chi-square statistics such as $\chi^2=\sum \frac{(O-E)^2}{E}$ .

Relationship questions may use correlation, regression, or tests of association depending on the data type.

Key Facts

Use a one-sample t test for one sample mean when the population standard deviation is unknown, with $t=\frac{\bar{x}-\mu_0}{s/\sqrt{n}}$ .
Use a two-sample t test to compare two independent means, with $t=\frac{\bar{x}_1-\bar{x}_2}{\sqrt{\frac{s_1^2}{n_1}+\frac{s_2^2}{n_2}}}$ .
Use a paired t test when the same subjects are measured twice or matched pairs are used, and test the mean difference with $t=\frac{\bar{d}}{s_d/\sqrt{n}}$ .
Use a one-proportion z test when testing one population proportion, with $z=\frac{\hat{p}-p_0}{\sqrt{\frac{p_0(1-p_0)}{n}}}$ .
Use a two-proportion z test when comparing two independent proportions, with $z=\frac{\hat{p}_1-\hat{p}_2}{\sqrt{\hat{p}(1-\hat{p})\left(\frac{1}{n_1}+\frac{1}{n_2}\right)}}$ .
Use a chi-square goodness-of-fit test for one categorical variable, with $\chi^2=\sum \frac{(O-E)^2}{E}$ .
Use a chi-square test of independence when testing whether two categorical variables are related in a two-way table.
Use ANOVA to compare $3$ or more group means, where the test statistic is $F=\frac{\text{variation between groups}}{\text{variation within groups}}$ .

Vocabulary

Null hypothesis: The null hypothesis, written $H_0$ , is the claim that there is no effect, no difference, or no relationship in the population.
Alternative hypothesis: The alternative hypothesis, written $H_a$ , is the claim that an effect, difference, or relationship exists.
P-value: The p-value is the probability of getting results at least as extreme as the sample results if $H_0$ is true.
Significance level: The significance level, written $\alpha$ , is the cutoff probability for rejecting $H_0$ , often $\alpha=0.05$ .
Independent samples: Independent samples are groups where the data values in one group are not naturally paired with values in another group.
Paired data: Paired data occur when two measurements are linked, such as before-and-after measurements on the same person.

Common Mistakes to Avoid

Using a two-sample t test for before-and-after data is wrong because the observations are paired, so the test should analyze the differences $d$ .
Using a z test for a mean when $\sigma$ is unknown is wrong because the sample standard deviation $s$ requires a t distribution.
Using a chi-square test when expected counts are too small is wrong because the approximation may be unreliable, especially when any expected count is below $5$ .
Choosing a test only from the sample size is wrong because the type of variable, number of groups, and independence matter first.
Rejecting $H_0$ because the p-value is large is wrong because a large p-value means the data do not give strong evidence against $H_0$ .

Practice Questions

1 A class wants to test whether the mean score on a standardized test differs from $75$ . A sample of $n=36$ students has $\bar{x}=78$ and $s=9$ . Which test should be used, and what is the test statistic?
2 A survey finds that $64$ out of $100$ juniors and $72$ out of $120$ seniors support a schedule change. Which test should be used to compare the two proportions?
3 A restaurant records customer ratings as poor, fair, good, or excellent for dine-in and takeout orders. Which test should be used to determine whether rating category is related to order type?
4 A researcher compares plant growth under $4$ different fertilizers. Explain why ANOVA is more appropriate than running many separate two-sample t tests.

Understanding Choosing the Right Statistical Test

The most important detail is how the data were produced. A test cannot repair a weak study design. Random sampling helps a sample represent a wider population.

Random assignment helps show that a treatment caused a difference. These are different ideas. For example, a survey of students who choose to answer a phone poll may not represent all students.

In an experiment, each student could be randomly assigned to use one study method or another. Measurements from different people are usually independent.

Measurements taken from the same person, such as pulse rate before and after exercise, are connected. Treating connected measurements as independent loses useful information and can give a misleading result.

Every test has conditions that should be checked before trusting its output. Tests involving means work best when observations are reasonably representative, independent, and free from extreme outliers. A graph such as a histogram or box plot can reveal strong skewness or unusual values.

With paired data, inspect the differences within each pair rather than the original two lists separately. Proportion tests need enough expected successes and failures for their usual approximation to work well.

Chi square tests need expected counts that are not too small in the table cells. When conditions fail, students may need more data, a different method, or a careful statement that the conclusion is uncertain.

A p value measures how surprising the observed result would be if a stated null model were true. A small p value is evidence against that model, but it does not prove a claim. It does not tell the chance that the null model is true.

Statistical significance can occur for a tiny difference when the sample is very large. Practical importance depends on the size of the difference and its real consequences. Report an effect size or a confidence interval when possible.

A confidence interval gives a range of values that fit the data reasonably well. When comparing three or more groups, an overall ANOVA result only says that at least one group mean differs.

Further planned comparisons are needed to find where the differences lie. Running many separate tests raises the chance of a false positive.

Correlation and regression need especially careful interpretation. A scatter plot should be examined before calculating a correlation. Correlation describes the direction and strength of a linear pattern.

A curved pattern can have a weak correlation even when the variables are clearly related. Regression uses the pattern to predict one variable from another. Its slope has units, so it can be interpreted in context, such as a predicted change in test score for each extra hour of sleep.

Predictions are less reliable far outside the observed data range. Most importantly, correlation does not establish cause.

Temperature may be linked with ice cream sales and sunburn because warm weather affects both. Hidden variables, biased samples, and reverse cause can all create relationships that look convincing at first.

Sign in to save

Sign in to save

Choosing the Right Statistical Test Cheat Sheet

Related Tools

Related Labs

Related Worksheets

Related Infographics

Study as Flashcards

Key Facts

Vocabulary

Common Mistakes to Avoid

Practice Questions

Understanding Choosing the Right Statistical Test