Statistics Grade advanced Answer Key

Statistics: Regression

Modeling relationships, interpreting coefficients, and checking assumptions

Answer Key

Name:

Date:

Score: / 16

Statistics: Regression

Modeling relationships, interpreting coefficients, and checking assumptions

Statistics - Grade advanced

Instructions: Read each problem carefully. Show your work, define variables when needed, and explain your reasoning using statistical language.

1

A study models exam score using hours studied. The fitted regression equation is score = 62.4 + 4.8(hours). Interpret the slope in context.

The slope describes the predicted change in the response variable for a 1-unit increase in the explanatory variable.

The slope means that for each additional hour studied, the model predicts an average increase of 4.8 points in exam score.
2

Using the model score = 62.4 + 4.8(hours), predict the exam score for a student who studied 6.5 hours.

The predicted score is 62.4 + 4.8(6.5) = 62.4 + 31.2 = 93.6. The model predicts an exam score of 93.6 points.
3

A regression model predicting monthly electricity cost from outdoor temperature is cost = 158.2 - 1.35(temperature), where temperature is measured in degrees Fahrenheit. Interpret the intercept. Is it likely meaningful in this context?

An intercept is only useful when x = 0 is realistic and within the data range.

The intercept means the model predicts an electricity cost of $158.20 when the outdoor temperature is 0 degrees Fahrenheit. It may not be meaningful if 0 degrees is outside the range of observed temperatures or if the linear pattern does not apply there.
4

A data set has correlation r = 0.82 between engine size and vehicle price. What is the coefficient of determination, and what does it mean in context?

The coefficient of determination is r squared, so 0.82 squared = 0.6724. About 67.24% of the variability in vehicle price is explained by the linear relationship with engine size.
5

A scatterplot of house size and selling price shows a strong positive linear trend, but one point represents a 10,000 square foot mansion priced far above the others. Explain how this point could affect the regression line.

Think about whether the point is unusual in the x-direction, the y-direction, or both.

The mansion could be a high-leverage point because its house size is far from the other x-values. If its price does not follow the general trend, it could strongly pull the regression line and change the slope and intercept.
6

A residual is defined as observed value minus predicted value. A student observed a delivery time of 42 minutes, and the regression model predicted 37 minutes. Find and interpret the residual.

The residual is 42 - 37 = 5 minutes. The positive residual means the actual delivery took 5 minutes longer than the model predicted.
7

A residual plot shows residuals that form a clear U-shaped pattern. What does this suggest about the appropriateness of a linear regression model?

For a good linear model, residuals should look randomly scattered around 0.

A U-shaped residual pattern suggests that the relationship is not linear. A linear regression model is likely inappropriate, and a curved model such as a quadratic regression may fit better.
8

A researcher fits a regression model predicting systolic blood pressure from age and body mass index. The fitted model is blood pressure = 74.1 + 0.62(age) + 1.85(BMI). Interpret the coefficient of BMI.

In multiple regression, each coefficient is interpreted while holding the other predictors constant.

Holding age constant, the model predicts that each 1-unit increase in BMI is associated with an average increase of 1.85 mmHg in systolic blood pressure.
9

For the model blood pressure = 74.1 + 0.62(age) + 1.85(BMI), predict the systolic blood pressure for a 50-year-old person with a BMI of 28.

The predicted blood pressure is 74.1 + 0.62(50) + 1.85(28) = 74.1 + 31.0 + 51.8 = 156.9. The model predicts a systolic blood pressure of 156.9 mmHg.
10

A regression output gives a slope estimate of 2.40 with standard error 0.60 for predicting crop yield from fertilizer amount. Test the null hypothesis that the true slope is 0 by computing the t-statistic.

Use t = coefficient estimate divided by standard error.

The t-statistic is estimate divided by standard error, so t = 2.40 / 0.60 = 4.00. This provides evidence that the true slope is different from 0, depending on the degrees of freedom and significance level.
11

A 95% confidence interval for a regression slope is (-0.15, 1.92). Explain what this interval suggests about whether the predictor has a statistically significant linear relationship with the response at the 0.05 level.

Because the interval includes 0, the data do not provide statistically significant evidence at the 0.05 level that the true slope is different from 0.
12

A model predicting annual income from years of education has R squared = 0.38. Another model adds work experience and has R squared = 0.47, but adjusted R squared only increases from 0.36 to 0.37. Explain why adjusted R squared is useful here.

Regular R squared never decreases when a predictor is added, even if the predictor adds little value.

Adjusted R squared is useful because it accounts for the number of predictors in the model. The small increase from 0.36 to 0.37 suggests that adding work experience improves the model only slightly after accounting for the extra predictor.
13

A scatterplot of advertising spending and sales shows increasing spread in sales as advertising spending increases. Which regression assumption may be violated, and what is the issue called?

The constant variance assumption may be violated. This issue is called heteroscedasticity, meaning the residuals have unequal spread across different levels of the predictor.
14

A company uses a regression model trained on data from stores with floor areas between 1,000 and 8,000 square feet. The model is used to predict sales for a new store with 25,000 square feet. Explain the statistical concern.

Regression models are usually safest for predictions within the range of observed x-values.

The prediction is an extrapolation because 25,000 square feet is far outside the range of the training data. The linear relationship may not hold there, so the prediction may be unreliable.
15

In a multiple regression model, two predictors have a correlation of 0.96. Explain the problem this may cause and how it can affect coefficient interpretation.

A correlation of 0.96 suggests severe multicollinearity. This can make coefficient estimates unstable, increase standard errors, and make it difficult to interpret the separate effect of each predictor.
16

A regression output includes the following summary: residual standard error = 3.2 and response variable = plant height in centimeters. Interpret the residual standard error in context.

Residual standard error is a typical prediction error measured in the same units as the response variable.

The residual standard error of 3.2 means that observed plant heights typically differ from the model's predicted heights by about 3.2 centimeters.

Statistics: Regression

Statistics: Regression

Related Tools

Related Labs

Related Infographics

Related Cheat Sheets