Sign in to save

Bookmark this page so you can find it later.

Sign in to save

Bookmark this page so you can find it later.

Scatter plots show the relationship between two numerical variables by graphing paired data as points. This cheat sheet helps students identify patterns, describe association, and decide whether a line of best fit is reasonable. These skills are useful for making predictions from data in science, business, sports, and everyday comparisons.

Key Facts

  • A scatter plot graphs ordered pairs (x,y)\left(x, y\right) to show how two numerical variables may be related.
  • A positive association means that as xx increases, yy tends to increase.
  • A negative association means that as xx increases, yy tends to decrease.
  • No association means the points do not show a clear upward or downward pattern.
  • A line of best fit is often written as y=mx+by = mx + b, where mm is the slope and bb is the yy-intercept.
  • The slope of a line is m=y2y1x2x1m = \frac{y_2 - y_1}{x_2 - x_1}, which represents the rate of change between two points.
  • The yy-intercept bb is the predicted value of yy when x=0x = 0.
  • Interpolation predicts a value inside the data range, while extrapolation predicts a value outside the data range.

Vocabulary

Scatter plot
A graph that displays paired numerical data as points (x,y)\left(x, y\right) on a coordinate plane.
Association
The overall relationship or pattern between the two variables in a scatter plot.
Correlation
A description of the direction and strength of the relationship between two numerical variables.
Line of best fit
A straight line that closely follows the overall trend of the data points in a scatter plot.
Slope
The rate of change of a line, calculated by m=y2y1x2x1m = \frac{y_2 - y_1}{x_2 - x_1}.
Outlier
A data point that is far away from the general pattern of the rest of the data.

Common Mistakes to Avoid

  • Confusing positive and negative association is wrong because the direction depends on whether yy tends to increase or decrease as xx increases.
  • Drawing a line of best fit through the most points is wrong because the line should balance the data with about the same number of points above and below it.
  • Using only one data point to make a prediction is wrong because predictions should be based on the overall trend, not a single value.
  • Ignoring outliers is wrong because an outlier can strongly affect the position and slope of a line of best fit.
  • Extrapolating too far beyond the data is risky because the trend may not continue outside the observed range.

Practice Questions

  1. 1 A line of best fit passes through (2,5)\left(2, 5\right) and (6,13)\left(6, 13\right). Find the slope mm.
  2. 2 A line of best fit is y=3x+4y = 3x + 4. Predict yy when x=7x = 7.
  3. 3 A scatter plot comparing hours studied and test score shows points rising from left to right. Describe the association and explain what it means.
  4. 4 Why is a prediction using interpolation usually more reliable than a prediction using extrapolation?