Back to Student Worksheet
Statistics Grade 9-12 Answer Key

Statistics: Linear Regression and Line of Best Fit

Modeling relationships with scatterplots, slope, residuals, and predictions

Answer Key
Name:
Date:
Score: / 12

Statistics: Linear Regression and Line of Best Fit

Modeling relationships with scatterplots, slope, residuals, and predictions

Statistics - Grade 9-12

Instructions: Read each problem carefully. Show your work in the space provided. Round decimal answers to the nearest tenth unless another direction is given.
  1. 1

    A student records the number of hours studied and the score earned on a test. The data are: (1, 68), (2, 72), (3, 78), (4, 83), (5, 87). Identify the explanatory variable, the response variable, and the general direction of the association.

    The explanatory variable is usually the input or x-variable.

    The explanatory variable is hours studied, and the response variable is test score. The data show a positive association because the test score tends to increase as the number of hours studied increases.
  2. 2

    A regression model for predicting a quiz score from hours of studying is y = 4.5x + 62, where x is hours studied and y is the predicted quiz score. Find the predicted quiz score for a student who studies for 6 hours.

    Substitute x = 6 into the equation: y = 4.5(6) + 62 = 27 + 62 = 89. The predicted quiz score is 89.
  3. 3

    A line of best fit for predicting monthly savings from monthly income is y = 0.18x + 45, where x is monthly income in dollars and y is monthly savings in dollars. Interpret the slope in context.

    The slope describes the predicted change in y for a 1-unit increase in x.

    The slope is 0.18, which means that for each additional 1 dollar of monthly income, the model predicts monthly savings will increase by 0.18 dollars, or 18 cents.
  4. 4

    A regression equation is y = 2.3x + 15. A data point has x = 10 and an actual y-value of 41. Find the predicted value and the residual.

    The predicted value is y = 2.3(10) + 15 = 38. The residual is actual minus predicted, so the residual is 41 - 38 = 3.
  5. 5

    A regression equation for predicting plant height from days after planting is y = 1.7x + 4.2. Explain what the y-intercept means in context, and state whether it is reasonable.

    The y-intercept is the predicted y-value when x equals 0.

    The y-intercept is 4.2, which means the model predicts the plant height is 4.2 centimeters at 0 days after planting. This may be reasonable if the plant was already a seedling, but it may not be reasonable if the plant started from a seed with no visible height.
  6. 6

    Use the data points (1, 2), (2, 4), (3, 5), (4, 7), and (5, 8). The mean of x is 3 and the mean of y is 5.2. The sum of the products of deviations is 15, and the sum of squared x-deviations is 10. Find the least-squares regression line.

    The slope is 15 divided by 10, which is 1.5. The intercept is 5.2 - 1.5(3) = 0.7. The least-squares regression line is y = 1.5x + 0.7.
  7. 7

    A data set includes x-values from 2 to 12. A regression model from this data is used to predict y when x = 9 and when x = 20. Classify each prediction as interpolation or extrapolation.

    Interpolation stays inside the observed x-values, while extrapolation goes outside them.

    Predicting y when x = 9 is interpolation because 9 is within the range of the data. Predicting y when x = 20 is extrapolation because 20 is outside the range of the data.
  8. 8

    A scatterplot has correlation coefficient r = -0.86. Describe the direction and strength of the linear relationship.

    The relationship is negative and strong. The negative sign shows that y tends to decrease as x increases, and 0.86 is close to 1 in absolute value, which indicates a strong linear pattern.
  9. 9

    For a regression model, the correlation coefficient is r = 0.75. Find r squared and explain its meaning in context.

    Square the correlation coefficient to find r squared.

    The value of r squared is 0.75 squared, which is 0.5625. This means about 56.25 percent of the variation in the response variable is explained by the linear model using the explanatory variable.
  10. 10

    Two possible lines are used to model the same data. Line A has residuals 2, -1, 3, and -2. Line B has residuals 1, -1, 1, and -1. Compare the sums of squared residuals and identify which line fits better by the least-squares criterion.

    Line A has a sum of squared residuals of 2 squared + (-1) squared + 3 squared + (-2) squared = 4 + 1 + 9 + 4 = 18. Line B has a sum of squared residuals of 1 squared + (-1) squared + 1 squared + (-1) squared = 1 + 1 + 1 + 1 = 4. Line B fits better because it has the smaller sum of squared residuals.
  11. 11

    A scatterplot shows a strong positive linear trend, but one point is far from the rest of the data with a very large x-value. Explain how this point could affect the regression line.

    Points far away in the x-direction can have a large effect on the fitted line.

    A point with a very large x-value that is far from the rest of the data may have high leverage. It can pull the regression line toward itself and strongly change the slope and intercept, especially if it does not follow the overall trend.
  12. 12

    A regression model shows that students who spend more time on a homework app tend to have higher course grades. Explain why this does not prove that the app causes higher grades.

    Correlation alone does not show cause and effect.

    This result shows an association, not proof of causation. Other factors, such as motivation, prior knowledge, attendance, or study habits, may explain why students who use the app more also tend to earn higher grades.
LivePhysics™.com Statistics - Grade 9-12 - Answer Key