P-Values Explained Infographic

A p-value is a way to measure how surprising your data would be if a null hypothesis were true. It is used in hypothesis testing to help decide whether an observed result is unusual enough to challenge a default assumption. P-values matter because they connect data, probability, and decision making in science, medicine, engineering, and social research.

A small p-value suggests that the observed result would be rare under the null hypothesis, but it does not prove that the alternative hypothesis is true.

In many tests, the p-value is shown as a shaded tail area under a probability distribution, such as a bell curve. The test statistic tells you where your result falls on that distribution, and the p-value is the probability of getting a result at least that extreme if the null hypothesis is correct. Researchers often compare the p-value to a significance level, such as alpha = 0.05, to decide whether to reject the null hypothesis.

This process must be paired with good study design, effect size, and context because statistical significance is not the same as practical importance.

Understanding P-Values Explained

A hypothesis test begins by turning a research claim into a measurable comparison. A school might compare average test scores after trying a new study routine. A medical trial might compare recovery times for two treatments.

The observed difference alone is not enough, because samples naturally vary. The test asks how large that difference is compared with the amount of random variation expected from sampling. This is why sample size matters.

Larger samples usually give a smaller standard error, meaning that an estimate becomes more precise. In a z test, the z score is found by taking the sample mean minus the proposed population mean, then dividing by the standard error.

A result far from zero indicates a difference that is large relative to expected sampling noise. Researchers must choose whether differences in one direction or either direction count before collecting data.

The significance level is a rule for controlling false alarms over many similar studies. If researchers use a level of five percent, they accept that a false positive can occur about five times in every one hundred studies when the null model is actually correct. It does not mean there is a five percent chance that one particular conclusion is wrong.

Another possible error is a false negative. This happens when a real effect exists but the study does not detect it. Statistical power is the chance of detecting an effect of a chosen size when it is real.

Power improves with larger samples, less noisy measurements, and stronger effects. Choosing the sample size before a study helps balance the risks of both error types.

Statistical significance does not tell students whether a result is important in everyday life. With thousands of observations, a tiny difference in average screen time might produce a small p-value even if it has no useful effect on learning or health. With a small sample, a meaningful difference can miss the significance cutoff because the data are too uncertain.

Good reports include the estimated difference, an effect size, and a confidence interval. An effect size describes the size of the change in practical terms. A confidence interval shows a range of values that fit reasonably well with the data under the method used.

For a new medicine, a one day reduction in recovery time may matter greatly. For a classroom activity, the same size change may need more context.

P-values can be misleading when researchers make many choices after seeing results. Testing twenty unrelated outcomes creates many chances for one small p-value just by luck. Trying several groups, removing inconvenient data points, or stopping data collection when a cutoff is reached can have the same effect.

This is often called p-hacking. Planning the analysis in advance and reporting all tests reduces this problem. Students should check the assumptions behind any test, including random sampling, independent observations, reliable measurement, and a suitable model for the data.

A result above the chosen cutoff means the study did not provide strong enough evidence against the null model. It does not prove that there is no effect.

Key Facts

A p-value is P(data at least as extreme as observed | null hypothesis is true).
If p <= alpha, reject the null hypothesis; if p > alpha, fail to reject the null hypothesis.
A common significance level is alpha = 0.05, but alpha should be chosen before looking at the data.
For a z-test, z = (x̄ - μ0) / (σ / sqrt(n)).
In a two-tailed z-test, p-value = 2P(Z >= |z|).
A smaller p-value means the data are less compatible with the null hypothesis, not that the null hypothesis has a small probability of being true.

Vocabulary

P-value: The probability of observing data at least as extreme as the sample result, assuming the null hypothesis is true.
Null hypothesis: The default claim being tested, often stating that there is no effect, no difference, or no relationship.
Alternative hypothesis: The claim that competes with the null hypothesis, often stating that an effect, difference, or relationship exists.
Significance level: The cutoff probability, called alpha, used to decide whether a p-value is small enough to reject the null hypothesis.
Test statistic: A standardized number, such as z or t, that shows how far the sample result is from the null hypothesis value.

Common Mistakes to Avoid

Saying the p-value is the probability that the null hypothesis is true. This is wrong because the p-value assumes the null hypothesis is true and then measures how unusual the data are under that assumption.
Treating p = 0.049 and p = 0.051 as completely different results. This is wrong because the cutoff alpha is a decision rule, while the evidence changes gradually.
Thinking a small p-value means a large or important effect. This is wrong because very large samples can make tiny effects statistically significant.
Choosing alpha after seeing the p-value. This is wrong because changing the cutoff after observing the data increases the chance of misleading conclusions.

Practice Questions

1 A one-tailed z-test gives z = 1.96. Using P(Z >= 1.96) = 0.025, what is the p-value, and would you reject H0 at alpha = 0.05?
2 A two-tailed z-test gives z = -2.40. Using P(Z >= 2.40) = 0.0082, find the p-value and decide whether the result is significant at alpha = 0.01.
3 A study reports p = 0.03 for a new teaching method, but the average test score improved by only 0.5 points. Explain why this result may be statistically significant but not practically important.

Sign in to save

Sign in to save

P-Values Explained

Related Tools

Related Labs

Related Worksheets

Related Cheat Sheets

Study as Flashcards

Understanding P-Values Explained

Key Facts

Vocabulary

Common Mistakes to Avoid

Practice Questions