Regression & Residual Diagnostics Lab
Fit regression models, examine residual plots, identify outliers and influential points, and discover why R² alone is never enough. Includes Anscombe's Quartet to see how identical statistics can hide completely different data patterns.
Guided Experiment: Why R² Isn't Enough
If four datasets have the same R² and the same regression line equation, do you think the regression model is equally valid for all four?
Write your hypothesis in the Lab Report panel, then click Next.
Scatter Plot with Regression Line
Residual Plot
Controls
| # | x | y | |
|---|---|---|---|
| 1 | 1 | 2.1 | |
| 2 | 2 | 4.3 | |
| 3 | 3 | 5.8 | |
| 4 | 4 | 8.2 | |
| 5 | 5 | 9.9 | |
| 6 | 6 | 12.1 | |
| 7 | 7 | 14.3 | |
| 8 | 8 | 15.8 | |
| 9 | 9 | 18 | |
| 10 | 10 | 20.2 | |
| 11 | 11 | 22.1 | |
| 12 | 12 | 23.9 |
Diagnostics
Click Run to fit the model and see diagnostics.
Data Table
(0 rows)| # | Dataset | Model | R² | Adj R² | Residual Pattern | Outliers | Influential Points | Recommendation |
|---|
Reference Guide
Residual Plots
A residual is the difference between the observed value and the predicted value.
Random scatter around zero means the model fits well. Curved patterns suggest the model is missing a nonlinear term. Funnel shapes indicate heteroscedasticity (non-constant variance).
Leverage & Influential Points
Leverage measures how far a point's x-value is from the mean.
Cook's distance combines leverage and residual size to find points that strongly influence the regression line.
Transformations for Linearity
When residuals show a curved pattern, applying a transformation can linearize the relationship.
- log(y) works for exponential growth data
- log(x) works for power-law relationships
- √y and √x moderate right-skewed distributions
- 1/x handles reciprocal relationships
After transforming, check whether the residual plot improves to random scatter.
Model Comparison (R² vs Adjusted R²)
R² always increases when you add more terms to a model, even if they add no real predictive power.
Adjusted R² penalizes for extra parameters (p). AIC and BIC provide further model comparison, with BIC applying a stronger penalty for model complexity.
Anscombe's Quartet demonstrates why you should never trust R² alone. All four datasets have nearly identical R² values yet completely different structures.