Sign in to save

Bookmark this page so you can find it later.

Sign in to save

Bookmark this page so you can find it later.

Multiple regression is used to predict one response variable from two or more explanatory variables. This cheat sheet helps students organize the model equation, interpret coefficients, and check how well a model fits data. It is especially useful when several factors may affect the same outcome, such as predicting test scores from study time, attendance, and sleep.

Key Facts

  • A multiple regression model with two predictors is written as y^=b0+b1x1+b2x2\hat{y} = b_0 + b_1x_1 + b_2x_2, where y^\hat{y} is the predicted response.
  • The intercept b0b_0 is the predicted value of yy when all explanatory variables equal 00, if that situation makes sense in context.
  • A slope coefficient bib_i estimates the change in y^\hat{y} for a 11-unit increase in xix_i while all other predictors are held constant.
  • A residual is the prediction error for one data point, calculated by e=yy^e = y - \hat{y}.
  • The coefficient of determination R2R^2 is the proportion of variation in yy explained by the regression model, with 0R210 \le R^2 \le 1.
  • Adjusted R2R^2 penalizes unnecessary predictors and is often better than R2R^2 for comparing models with different numbers of explanatory variables.
  • Multicollinearity occurs when predictors are strongly related to each other, which can make coefficient estimates unstable and hard to interpret.
  • A prediction should usually be made only within the range of the original data because extrapolation can be unreliable.

Vocabulary

Multiple Regression
A statistical method that predicts one response variable using two or more explanatory variables.
Response Variable
The variable being predicted or explained, usually represented by yy.
Explanatory Variable
A variable used to predict the response variable, often represented by x1x_1, x2x_2, and so on.
Coefficient
A number in the regression equation that shows how a predictor is associated with the predicted response when other predictors are held constant.
Residual
The difference between an observed value and its predicted value, calculated as e=yy^e = y - \hat{y}.
Multicollinearity
A problem that occurs when explanatory variables are highly correlated with each other.

Common Mistakes to Avoid

  • Interpreting a coefficient without saying other variables are held constant is wrong because each slope in multiple regression adjusts for the other predictors in the model.
  • Assuming a larger R2R^2 always means a better model is wrong because adding more predictors can increase R2R^2 even when those predictors are not useful.
  • Using the model far outside the data range is wrong because extrapolated predictions may not follow the same pattern seen in the sample.
  • Treating correlation between predictors as harmless is wrong because strong multicollinearity can make slopes change dramatically when the model changes.
  • Confusing residuals with predicted values is wrong because a residual measures error, while y^\hat{y} is the model's predicted response.

Practice Questions

  1. 1 A model predicts final exam score with y^=42+5x1+3x2\hat{y} = 42 + 5x_1 + 3x_2, where x1x_1 is study hours and x2x_2 is hours of sleep. Find y^\hat{y} when x1=6x_1 = 6 and x2=8x_2 = 8.
  2. 2 For one student, the observed score is y=91y = 91 and the predicted score is y^=86\hat{y} = 86. Find the residual e=yy^e = y - \hat{y}.
  3. 3 In the model y^=1200+850x140x2\hat{y} = 1200 + 850x_1 - 40x_2, where x1x_1 is years of experience and x2x_2 is commute distance in miles, interpret the coefficient 40-40 in context.
  4. 4 A model has a high R2R^2, but two predictors are strongly correlated with each other. Explain why the model may still be difficult to interpret.