Sign in to save

Bookmark this page so you can find it later.

Sign in to save

Bookmark this page so you can find it later.

Linear regression inference helps students decide whether a linear relationship in sample data gives evidence of a real relationship in the population. This cheat sheet covers inference for the slope, including confidence intervals, hypothesis tests, conditions, and interpretation. It is useful when students already know how to fit a least-squares regression line and now need to make conclusions beyond the sample data. The main parameter is the population slope β1\beta_1, which describes the average change in the response variable for each 11-unit increase in the explanatory variable. Inference uses the sample slope b1b_1, its standard error SEb1SE_{b_1}, and a tt distribution with df=n2df = n - 2. Students must check linearity, independence, normality of residuals, and equal variability before trusting a confidence interval or significance test.

Key Facts

  • The population regression model is μy=β0+β1x\mu_y = \beta_0 + \beta_1x, where β0\beta_0 is the intercept and β1\beta_1 is the slope.
  • The least-squares regression line is y^=b0+b1x\hat{y} = b_0 + b_1x, where b0b_0 and b1b_1 estimate β0\beta_0 and β1\beta_1.
  • The residual standard error is s=(yiy^i)2n2s = \sqrt{\frac{\sum (y_i - \hat{y}_i)^2}{n - 2}}, which estimates the typical vertical prediction error.
  • The standard error of the slope is SEb1=s(xixˉ)2SE_{b_1} = \frac{s}{\sqrt{\sum (x_i - \bar{x})^2}}.
  • A confidence interval for the population slope is b1±tSEb1b_1 \pm t^*SE_{b_1} with df=n2df = n - 2.
  • The test statistic for testing H0:β1=0H_0: \beta_1 = 0 is t=b10SEb1t = \frac{b_1 - 0}{SE_{b_1}} with df=n2df = n - 2.
  • The four main conditions are linear pattern, independent observations, approximately normal residuals, and roughly constant residual spread.
  • A small PP-value gives evidence against H0:β1=0H_0: \beta_1 = 0, meaning the data suggest a linear relationship in the population.

Vocabulary

Population slope
The population slope β1\beta_1 is the true average change in the mean response for each 11-unit increase in the explanatory variable.
Sample slope
The sample slope b1b_1 is the slope of the least-squares regression line computed from sample data.
Residual
A residual is the difference between an observed value and its predicted value, written as ei=yiy^ie_i = y_i - \hat{y}_i.
Standard error of the slope
The standard error SEb1SE_{b_1} measures the typical sampling variability of the sample slope b1b_1.
Degrees of freedom
For linear regression inference, the degrees of freedom are df=n2df = n - 2 because two parameters, b0b_0 and b1b_1, are estimated.
Prediction interval
A prediction interval estimates a likely range for an individual future response value at a given explanatory value.

Common Mistakes to Avoid

  • Interpreting b1b_1 as proof of causation is wrong because regression inference can show association, but causation requires a well-designed experiment or strong causal evidence.
  • Using regression inference without checking residual plots is wrong because curved patterns, outliers, or changing spread can make the tt procedures unreliable.
  • Forgetting that df=n2df = n - 2 is wrong because linear regression estimates both an intercept and a slope before calculating residual variation.
  • Interpreting a confidence interval for β1\beta_1 as a range for yy is wrong because it estimates the population slope, not individual response values.
  • Extrapolating far beyond the observed xx-values is wrong because the linear pattern may not continue outside the range of the data.

Practice Questions

  1. 1 A regression analysis gives b1=2.4b_1 = 2.4, SEb1=0.6SE_{b_1} = 0.6, and n=18n = 18. Compute the test statistic for H0:β1=0H_0: \beta_1 = 0 and state the degrees of freedom.
  2. 2 A sample gives b1=1.8b_1 = -1.8, SEb1=0.5SE_{b_1} = 0.5, and df=12df = 12. If t=2.179t^* = 2.179, find a 95%95\% confidence interval for β1\beta_1.
  3. 3 For a regression with n=25n = 25, the residual sum of squares is 184184. Compute the residual standard error s=(yiy^i)2n2s = \sqrt{\frac{\sum (y_i - \hat{y}_i)^2}{n - 2}}.
  4. 4 A residual plot shows a clear curved pattern. Explain why a linear regression inference procedure for β1\beta_1 may not be appropriate.