Sign in to save

Bookmark this page so you can find it later.

Sign in to save

Bookmark this page so you can find it later.

Experimental design and causal inference help students understand when data can support a cause and effect claim. This cheat sheet covers randomized experiments, observational studies, treatment effects, confounding, blocking, and basic causal diagrams. Students need these tools to separate association from causation and to judge whether a study design answers the question being asked. The goal is to connect statistical comparisons with the assumptions that make causal conclusions valid. The most important ideas are potential outcomes, random assignment, control groups, and adjustment for confounders. A treatment effect is often written as a difference in outcomes, such as τ=Y(1)Y(0)\tau = Y(1) - Y(0), even though only one potential outcome is observed for each unit. Randomization makes treatment assignment independent of potential outcomes, written T{Y(0),Y(1)}T \perp \{Y(0),Y(1)\}. In observational studies, causal claims require careful assumptions, such as no unmeasured confounding and proper adjustment for variables that affect both treatment and outcome.

Key Facts

  • The individual causal effect is τi=Yi(1)Yi(0)\tau_i = Y_i(1) - Y_i(0), but only one of Yi(1)Y_i(1) or Yi(0)Y_i(0) is observed for each unit.
  • The average treatment effect is ATE=E[Y(1)Y(0)]ATE = E[Y(1) - Y(0)], which summarizes the mean causal effect in a target population.
  • In a randomized experiment, treatment assignment satisfies T{Y(0),Y(1)}T \perp \{Y(0),Y(1)\}, so treated and control groups are comparable in expectation.
  • A simple difference in means estimator is τ^=YˉTYˉC\hat{\tau} = \bar{Y}_T - \bar{Y}_C, where YˉT\bar{Y}_T is the treated mean and YˉC\bar{Y}_C is the control mean.
  • Confounding occurs when a variable ZZ affects both treatment TT and outcome YY, creating a backdoor path such as TZYT \leftarrow Z \rightarrow Y.
  • Blocking or stratification improves precision by comparing treatment groups within levels of an important variable, then combining stratum estimates.
  • A standard error for a difference in independent means is SE(YˉTYˉC)=sT2nT+sC2nCSE(\bar{Y}_T - \bar{Y}_C) = \sqrt{\frac{s_T^2}{n_T} + \frac{s_C^2}{n_C}}.
  • A common approximate confidence interval for a treatment effect is τ^±zSE(τ^)\hat{\tau} \pm z^{*}SE(\hat{\tau}), where zz^{*} depends on the confidence level.

Vocabulary

Random assignment
A design method where units are assigned to treatment conditions by chance so that groups are comparable before treatment.
Potential outcome
The outcome a unit would have under a specific treatment condition, such as Yi(1)Y_i(1) under treatment or Yi(0)Y_i(0) under control.
Average treatment effect
The expected difference between potential outcomes in a population, written ATE=E[Y(1)Y(0)]ATE = E[Y(1) - Y(0)].
Confounder
A variable that influences both the treatment and the outcome, which can make an association look causal when it is not.
Blocking
A design strategy that groups similar units before random assignment to reduce variation and improve precision.
Causal diagram
A graph using arrows to represent assumed causal relationships among variables.

Common Mistakes to Avoid

  • Treating correlation as causation is wrong because an association between TT and YY may be explained by a confounder ZZ rather than a causal effect.
  • Adjusting for a collider is wrong because conditioning on a variable caused by both TT and YY can create a false association between them.
  • Ignoring random assignment failures is wrong because noncompliance, attrition, or missing data can break the comparability that randomization was meant to create.
  • Comparing raw group means in an observational study is wrong when treatment groups differ on variables that also affect the outcome.
  • Using a tiny sample without considering power is wrong because a study may have a large SE(τ^)SE(\hat{\tau}) and fail to detect meaningful effects.

Practice Questions

  1. 1 In an experiment, the treated group has nT=50n_T = 50, YˉT=82\bar{Y}_T = 82, and the control group has nC=50n_C = 50, YˉC=76\bar{Y}_C = 76. Compute τ^=YˉTYˉC\hat{\tau} = \bar{Y}_T - \bar{Y}_C.
  2. 2 Given sT=12s_T = 12, sC=10s_C = 10, nT=40n_T = 40, and nC=40n_C = 40, compute SE(YˉTYˉC)=sT2nT+sC2nCSE(\bar{Y}_T - \bar{Y}_C) = \sqrt{\frac{s_T^2}{n_T} + \frac{s_C^2}{n_C}}.
  3. 3 A blocked experiment has stratum effects τ^1=4\hat{\tau}_1 = 4 and τ^2=10\hat{\tau}_2 = 10 with stratum weights w1=0.60w_1 = 0.60 and w2=0.40w_2 = 0.40. Compute τ^blocked=w1τ^1+w2τ^2\hat{\tau}_{blocked} = w_1\hat{\tau}_1 + w_2\hat{\tau}_2.
  4. 4 A study finds that students who attend extra tutoring have higher test scores, but tutoring was voluntary. Explain why this evidence alone may not justify the causal claim that tutoring caused higher scores.