Statistics Grade 9-12 Answer Key

Statistics: Logistic Regression and Categorical Outcomes

Modeling probabilities for yes-or-no outcomes

Answer Key

Name:

Date:

Score: / 12

Statistics: Logistic Regression and Categorical Outcomes

Modeling probabilities for yes-or-no outcomes

Statistics - Grade 9-12

Instructions: Read each problem carefully. Show your work in the space provided. Round probabilities to three decimal places unless another rounding rule is given.

1

A school wants to predict whether a student passes a certification exam. The outcome is pass or not pass. Explain why logistic regression is more appropriate than linear regression for this situation.

Think about what values a probability is allowed to have.

Logistic regression is more appropriate because the response variable is categorical with two outcomes. It models the probability of passing and keeps predicted probabilities between 0 and 1, while linear regression could predict impossible probabilities below 0 or above 1.
2

A logistic regression model predicts the probability that a customer buys a product using the equation log(p/(1 - p)) = -2.0 + 0.5x, where x is the number of website visits. What is the log-odds when x = 4?

When x = 4, the log-odds are -2.0 + 0.5(4) = 0. The log-odds of buying the product are 0.
3

Using the model log(p/(1 - p)) = -2.0 + 0.5x, find the predicted probability that a customer buys a product when x = 4 website visits.

If the log-odds equal 0, the odds equal 1.

From problem 2, the log-odds are 0. The odds are e^0 = 1, so p = 1/(1 + 1) = 0.500. The predicted probability of buying the product is 0.500.
4

A logistic regression model is log(p/(1 - p)) = -1.2 + 0.8x, where x = 1 if a person saw an ad and x = 0 if the person did not see the ad. What is the log-odds for a person who did not see the ad?

For a person who did not see the ad, x = 0. The log-odds are -1.2 + 0.8(0) = -1.2.
5

For the model log(p/(1 - p)) = -1.2 + 0.8x, where x = 1 if a person saw an ad and x = 0 if the person did not see the ad, what is the odds ratio for seeing the ad compared with not seeing the ad? Round to two decimal places.

In logistic regression, exponentiating a coefficient gives the odds ratio.

The odds ratio is e^0.8, which is about 2.23. This means the odds of buying are about 2.23 times as high for a person who saw the ad compared with a person who did not see it.
6

A model predicts whether a student joins an after-school club. The coefficient for hours spent at school events is 0.4. Interpret this coefficient in terms of odds.

For each additional hour spent at school events, the log-odds of joining the club increase by 0.4. The odds are multiplied by e^0.4, which is about 1.49, so the odds increase by about 49% for each additional hour.
7

The table shows predicted probabilities from a logistic regression model for passing a driving test based on practice hours. Describe the overall pattern shown by the predictions: 0 hours: 0.18, 2 hours: 0.31, 4 hours: 0.50, 6 hours: 0.69, 8 hours: 0.82.

Look at how the probabilities change as the number of hours increases.

The predicted probability of passing increases as practice hours increase. The increase is not perfectly linear because logistic regression produces an S-shaped pattern in probabilities.
8

A logistic regression model gives a predicted probability of 0.75 that a student will submit homework on time. What are the predicted odds of submitting homework on time?

Odds compare the probability of the event to the probability of the event not happening.

The predicted odds are p/(1 - p) = 0.75/0.25 = 3. The odds of submitting homework on time are 3 to 1.
9

A logistic curve is shown for the probability that a plant survives based on days of watering. The curve starts near 0 for very few watering days, rises quickly in the middle, and levels off near 1 after many watering days. Explain why this shape makes sense for a probability model.

This shape makes sense because probabilities cannot go below 0 or above 1. The model shows that survival probability increases with more watering days, but the increase slows as the probability gets close to 1.
10

A researcher codes the outcome as 1 for owns a bicycle and 0 for does not own a bicycle. The model includes distance from school as a predictor, and the coefficient is -0.3. What does the negative coefficient suggest?

A negative coefficient means the predictor and the log-odds move in opposite directions.

The negative coefficient suggests that as distance from school increases, the log-odds of owning a bicycle decrease. In other words, students who live farther from school are predicted to have lower odds of owning a bicycle, according to this model.
11

A confusion matrix for a model predicting whether emails are spam is shown: actual spam predicted spam = 42, actual spam predicted not spam = 8, actual not spam predicted spam = 10, actual not spam predicted not spam = 40. How many emails were classified correctly, and what was the accuracy?

The correctly classified emails are 42 + 40 = 82. The total number of emails is 42 + 8 + 10 + 40 = 100, so the accuracy is 82/100 = 0.82, or 82%.
12

A model predicts whether a patient has a condition. Using a cutoff of 0.50, predicted probabilities of 0.50 or higher are classified as yes. Classify these four patients: Patient A = 0.82, Patient B = 0.47, Patient C = 0.50, Patient D = 0.12.

Compare each predicted probability with the cutoff value.

Patient A is classified as yes because 0.82 is at least 0.50. Patient B is classified as no because 0.47 is below 0.50. Patient C is classified as yes because 0.50 meets the cutoff. Patient D is classified as no because 0.12 is below 0.50.

LivePhysics™.com Statistics - Grade 9-12 - Answer Key