Sign in to save

Bookmark this page so you can find it later.

Sign in to save

Bookmark this page so you can find it later.

Correlation vs causation helps students decide whether two variables are simply related or whether one variable truly produces a change in another. This cheat sheet is useful because graphs, headlines, and studies often show patterns that can be misread. Students need clear rules for interpreting scatter plots, correlation values, and evidence claims. The goal is to make statistical conclusions careful, accurate, and supported by data. The core idea is that correlation measures association, while causation requires stronger evidence. A scatter plot shows direction, form, and strength, and the correlation coefficient rr summarizes the strength and direction of a linear relationship. A strong value of rr does not prove that one variable causes the other. To argue causation, students should look for controlled experiments, random assignment, plausible mechanisms, and possible lurking variables.

Key Facts

  • Correlation means two variables are associated, so as one variable changes, the other tends to change in a pattern.
  • A positive correlation means both variables tend to increase together, while a negative correlation means one tends to decrease as the other increases.
  • The correlation coefficient rr measures the direction and strength of a linear relationship, with 1r1-1 \le r \le 1.
  • Values of rr near 11 or 1-1 show a strong linear relationship, while values near 00 show little or no linear relationship.
  • The sample correlation coefficient can be calculated with r=1n1(xixˉsx)(yiyˉsy)r = \frac{1}{n - 1}\sum \left(\frac{x_i - \bar{x}}{s_x}\right)\left(\frac{y_i - \bar{y}}{s_y}\right).
  • The coefficient of determination r2r^2 gives the fraction of variation in the response variable explained by a linear model.
  • Correlation does not prove causation because a lurking variable may affect both variables or the direction of cause may be reversed.
  • Strong evidence for causation usually comes from a controlled experiment with random assignment, comparison groups, and careful control of other variables.

Vocabulary

Correlation
A statistical relationship showing how two variables tend to change together.
Causation
A cause-and-effect relationship in which a change in one variable directly produces a change in another variable.
Scatter Plot
A graph of paired data values that helps show the direction, form, and strength of a relationship.
Correlation Coefficient
A number rr between 1-1 and 11 that describes the strength and direction of a linear relationship.
Lurking Variable
An unmeasured variable that may explain or influence the relationship between two studied variables.
Controlled Experiment
A study design that compares groups while controlling conditions so researchers can test for cause and effect.

Common Mistakes to Avoid

  • Saying a strong correlation proves causation is wrong because a high value such as r=0.92r = 0.92 can still happen when another variable affects both quantities.
  • Ignoring lurking variables is wrong because a hidden factor can create the pattern, such as temperature affecting both ice cream sales and swimming pool visits.
  • Using rr for a curved relationship is wrong because the correlation coefficient rr measures linear association, not all possible patterns.
  • Assuming r=0r = 0 means no relationship is wrong because the data may have a strong nonlinear pattern even when the linear correlation is near 00.
  • Confusing direction with strength is wrong because the sign of rr shows direction, while the distance of rr from 00 shows strength.

Practice Questions

  1. 1 A study finds that hours studied and test score have r=0.78r = 0.78. Describe the direction and strength of the relationship.
  2. 2 A data set has r=0.65r = -0.65. What is r2r^2, and what does it mean in context for a linear model?
  3. 3 A city finds that daily temperature and lemonade sales have r=0.89r = 0.89. Does this prove that buying lemonade raises the temperature? Explain briefly.
  4. 4 A school reports that students who join a math club have higher math scores than students who do not. Explain why this observation alone does not prove the club caused the higher scores.