Sign in to save

Bookmark this page so you can find it later.

Sign in to save

Bookmark this page so you can find it later.

Residuals measure how far each observed data value is from the value predicted by a regression model. This cheat sheet helps students calculate residuals, build residual plots, and decide whether a linear model is reasonable. Residual plots are important because they reveal patterns that a scatterplot or correlation value may hide. Students use them to check model fit, spot outliers, and compare predictions to real data. The main formula is e=yy^e = y - \hat{y}, where ee is the residual, yy is the observed value, and y^\hat{y} is the predicted value. A residual plot places the explanatory variable on the horizontal axis and the residuals on the vertical axis. A good linear model usually has residuals randomly scattered around 00. Curved patterns, changing spread, or extreme points suggest the model may not be appropriate.

Key Facts

  • A residual is calculated using e=yy^e = y - \hat{y}, where yy is the observed value and y^\hat{y} is the predicted value.
  • A positive residual means the observed value is above the regression line because y>y^y > \hat{y}.
  • A negative residual means the observed value is below the regression line because y<y^y < \hat{y}.
  • For a least-squares regression line, the residuals always have a sum of e=0\sum e = 0, up to rounding error.
  • A residual plot graphs each point as (x,e)(x, e), using the original explanatory variable xx and the residual ee.
  • A residual plot with random scatter around e=0e = 0 supports using a linear model.
  • A curved pattern in a residual plot suggests that a nonlinear model may fit the data better than a line.
  • A fan-shaped residual plot shows nonconstant spread, meaning prediction errors change size as xx changes.

Vocabulary

Residual
A residual is the difference between an observed response value and the value predicted by a model, calculated as e=yy^e = y - \hat{y}.
Predicted Value
A predicted value, written y^\hat{y}, is the response value estimated by a regression equation for a given xx.
Residual Plot
A residual plot is a graph of residuals against the explanatory variable, usually shown as points (x,e)(x, e).
Least-Squares Regression Line
A least-squares regression line is the line that minimizes the sum of squared residuals, e2\sum e^2.
Outlier
An outlier is a data point with an unusually large residual or an unusual position compared with the rest of the data.
Nonlinear Pattern
A nonlinear pattern occurs when residuals show a curve or systematic shape instead of random scatter around 00.

Common Mistakes to Avoid

  • Reversing the residual formula is wrong because y^y\hat{y} - y gives the opposite sign; use e=yy^e = y - \hat{y}.
  • Thinking a high correlation always means a good linear model is wrong because a residual plot can reveal curvature or changing spread.
  • Ignoring the sign of a residual is wrong because positive residuals mean the point is above the line and negative residuals mean it is below the line.
  • Using yy instead of ee on the vertical axis of a residual plot is wrong because the plot must show prediction errors, not original response values.
  • Calling any large residual an error in the data is wrong because an outlier may be real and should be investigated before being removed.

Practice Questions

  1. 1 A regression model predicts y^=42\hat{y} = 42 for a data point with observed value y=50y = 50. Find the residual ee.
  2. 2 For the regression equation y^=3x+7\hat{y} = 3x + 7, find the residual when x=4x = 4 and the observed value is y=21y = 21.
  3. 3 A point has residual e=6e = -6 and predicted value y^=18\hat{y} = 18. Find the observed value yy.
  4. 4 A residual plot shows a clear U-shaped pattern around e=0e = 0. Explain what this suggests about using a linear model for the data.