Back to Student Worksheet
Statistics Grade advanced Answer Key

Statistics: Hypothesis Testing

Testing claims with p-values, critical values, and error analysis

Answer Key
Name:
Date:
Score: / 15

Statistics: Hypothesis Testing

Testing claims with p-values, critical values, and error analysis

Statistics - Grade advanced

Instructions: Read each problem carefully. State the hypotheses, choose the correct test, show calculations, and write a conclusion in context.
  1. 1

    A company claims its light bulbs last an average of 1000 hours. A random sample of 40 bulbs has a mean lifetime of 965 hours and a sample standard deviation of 120 hours. Test the claim at the 0.05 significance level using a two-sided test.

    Use the sample standard deviation, so the t distribution is appropriate.

    The hypotheses are H0: mu = 1000 and Ha: mu is not equal to 1000. Since the population standard deviation is unknown, use a one-sample t test. The test statistic is t = (965 - 1000) / (120 / sqrt(40)) = -1.84 with 39 degrees of freedom. The two-sided p-value is about 0.073. Since 0.073 is greater than 0.05, we fail to reject H0. There is not enough evidence at the 0.05 level to conclude that the mean bulb lifetime differs from 1000 hours.
  2. 2

    A political poll finds that 312 out of 600 randomly selected voters support a proposed policy. Test whether the true support proportion is greater than 0.50 at alpha = 0.05.

    The hypotheses are H0: p = 0.50 and Ha: p > 0.50. The sample proportion is 312/600 = 0.52. The test statistic is z = (0.52 - 0.50) / sqrt(0.50(0.50)/600) = 0.98. The one-sided p-value is about 0.164. Since 0.164 is greater than 0.05, we fail to reject H0. The sample does not provide enough evidence that a majority of voters support the policy.
  3. 3

    A new teaching method is compared with a standard method. Test scores from independent random samples are shown: new method n = 25, mean = 84, s = 10; standard method n = 30, mean = 78, s = 12. Test whether the new method has a higher mean score at alpha = 0.05. Assume unequal variances.

    Because the variances are not assumed equal, use Welch's t test rather than a pooled t test.

    The hypotheses are H0: mu_new - mu_standard = 0 and Ha: mu_new - mu_standard > 0. Using Welch's two-sample t test, the standard error is sqrt(10^2/25 + 12^2/30) = 2.97. The test statistic is t = (84 - 78) / 2.97 = 2.02. The Welch degrees of freedom are about 53. The one-sided p-value is about 0.024. Since 0.024 is less than 0.05, we reject H0. There is evidence that the new method has a higher mean score.
  4. 4

    A drug trial records recovery times for 12 patients before and after a treatment change. The differences, computed as before minus after, have mean 3.4 days and standard deviation 4.8 days. Test whether the treatment change reduces recovery time at alpha = 0.05.

    The same patients are measured twice, so the observations are paired.

    The hypotheses are H0: mu_d = 0 and Ha: mu_d > 0, where d is before minus after. This is a paired t test. The test statistic is t = 3.4 / (4.8 / sqrt(12)) = 2.45 with 11 degrees of freedom. The one-sided p-value is about 0.016. Since 0.016 is less than 0.05, we reject H0. There is evidence that the treatment change reduces recovery time.
  5. 5

    A manufacturer wants to know whether a machine's variance in fill volume has increased beyond 4.0 square milliliters. A sample of 20 containers has sample variance 6.2. Test at alpha = 0.05 assuming normal fill volumes.

    The hypotheses are H0: sigma squared = 4.0 and Ha: sigma squared > 4.0. Use a chi-square test for variance. The test statistic is chi-square = (n - 1)s squared / sigma0 squared = 19(6.2)/4.0 = 29.45 with 19 degrees of freedom. The upper-tail critical value at alpha = 0.05 is about 30.14, and the p-value is slightly greater than 0.05. Since the statistic is less than the critical value, we fail to reject H0. There is not enough evidence at the 0.05 level that the variance has increased beyond 4.0.
  6. 6

    The following contingency table summarizes whether customers bought a warranty by age group: under 30: 18 yes, 42 no; 30 to 49: 35 yes, 65 no; 50 and older: 47 yes, 53 no. Test whether warranty purchase is independent of age group at alpha = 0.05.

    Compute each expected count as row total times column total divided by the grand total.

    The hypotheses are H0: warranty purchase and age group are independent and Ha: they are associated. Totals are 100 yes, 160 no, and 260 overall. Expected yes counts are 23.08, 38.46, and 38.46 for the three age groups. Expected no counts are 36.92, 61.54, and 61.54. The chi-square statistic is about 7.76 with (3 - 1)(2 - 1) = 2 degrees of freedom. The p-value is about 0.021. Since 0.021 is less than 0.05, we reject H0. There is evidence that warranty purchase is associated with age group.
  7. 7

    A one-way ANOVA compares mean reaction times across four caffeine dose groups. The between-groups sum of squares is 450 and the within-groups sum of squares is 1800. There are 10 participants in each group. Test for any difference among group means at alpha = 0.05.

    ANOVA tests whether all group means are equal, not which specific pairs are different.

    There are k = 4 groups and N = 40 participants. The between-groups degrees of freedom are 3, and the within-groups degrees of freedom are 36. MS_between = 450/3 = 150 and MS_within = 1800/36 = 50. The F statistic is 150/50 = 3.00. For df = 3 and 36, the p-value is about 0.043. Since 0.043 is less than 0.05, we reject H0. At least one caffeine dose group has a different mean reaction time.
  8. 8

    For a two-sided z test of H0: mu = 50 versus Ha: mu is not equal to 50, the test statistic is z = -2.31. Find the p-value and state the conclusion at alpha = 0.01.

    For a two-sided z test, the p-value is 2P(Z <= -2.31), which is about 2(0.0104) = 0.0208. Since 0.0208 is greater than 0.01, we fail to reject H0. The result is not statistically significant at the 0.01 level.
  9. 9

    A researcher reports a 95% confidence interval for a mean difference of (1.2, 5.8). Explain what this interval implies for a two-sided hypothesis test of H0: mu_diff = 0 at alpha = 0.05.

    A two-sided alpha = 0.05 test is closely related to a 95% confidence interval.

    Because 0 is not inside the 95% confidence interval, the corresponding two-sided hypothesis test at alpha = 0.05 would reject H0. The data provide evidence that the true mean difference is not 0. Since the entire interval is positive, the evidence suggests a positive mean difference.
  10. 10

    A lab tests 20 independent null hypotheses, all at alpha = 0.05, without any correction. If all 20 null hypotheses are actually true, what is the probability of making at least one Type I error?

    Use the complement rule: at least one error equals 1 minus no errors.

    The probability of making no Type I errors is (1 - 0.05)^20 = 0.95^20, which is about 0.358. Therefore, the probability of making at least one Type I error is 1 - 0.358 = 0.642. There is about a 64.2% chance of at least one false positive.
  11. 11

    Define a Type I error and a Type II error in the context of a medical screening test where H0 means a patient does not have the disease and Ha means a patient has the disease.

    A Type I error occurs when the test rejects H0 even though H0 is true, so the patient is diagnosed as having the disease when the patient does not have it. A Type II error occurs when the test fails to reject H0 even though Ha is true, so the patient is not diagnosed with the disease even though the patient has it.
  12. 12

    A power analysis states that a test has power 0.80 to detect a specified effect size at alpha = 0.05. Interpret the meaning of power in this setting.

    Power is the probability of rejecting the null hypothesis when the specified alternative effect is truly present. A power of 0.80 means the test has an 80% chance of detecting that effect and a 20% chance of making a Type II error for that effect size.
  13. 13

    A random sample of 150 students has mean study time 14.2 hours per week with sample standard deviation 5.5 hours. Test H0: mu = 15 versus Ha: mu < 15 at alpha = 0.05.

    With a large sample, the t distribution is close to the standard normal distribution, but the t test is still appropriate.

    Use a one-sample t test because the population standard deviation is unknown. The test statistic is t = (14.2 - 15) / (5.5 / sqrt(150)) = -1.78 with 149 degrees of freedom. The one-sided p-value is about 0.039. Since 0.039 is less than 0.05, we reject H0. There is evidence that the mean study time is less than 15 hours per week.
  14. 14

    A quality-control engineer wants to test whether the defect rate is below 3%. In a sample of 500 items, 9 are defective. Test H0: p = 0.03 versus Ha: p < 0.03 at alpha = 0.05.

    The sample proportion is 9/500 = 0.018. The test statistic is z = (0.018 - 0.03) / sqrt(0.03(0.97)/500) = -1.57. The one-sided p-value is about 0.058. Since 0.058 is greater than 0.05, we fail to reject H0. There is not enough evidence at the 0.05 level to conclude that the defect rate is below 3%.
  15. 15

    A study reports p = 0.003 for a hypothesis test. Which statement is the best interpretation: A, the null hypothesis has a 0.3% chance of being true; B, assuming the null hypothesis is true, results at least as extreme as the observed result would occur about 0.3% of the time; C, the alternative hypothesis has a 99.7% chance of being true. Explain your choice.

    A p-value is conditional on the null hypothesis being true.

    Statement B is the best interpretation. A p-value is the probability, assuming the null hypothesis is true, of observing a result at least as extreme as the one obtained. It is not the probability that the null hypothesis is true or the probability that the alternative hypothesis is true.
LivePhysics™.com Statistics - Grade advanced - Answer Key