Sign in to save

Bookmark this page so you can find it later.

Sign in to save

Bookmark this page so you can find it later.

Confusion Matrix & Classification Metrics cheat sheet - grade 11-12

Click image to open full size

Statistics Grade 11-12

Confusion Matrix & Classification Metrics Cheat Sheet

A printable reference covering confusion matrices, accuracy, precision, recall, specificity, F1 score, and classification thresholds for grades 11-12.

Download PNG

This cheat sheet covers how to summarize and evaluate classification models using a confusion matrix. Students need it to understand how predictions can be correct or incorrect in different ways. It is especially useful in statistics, data science, machine learning, and real-world decision problems such as medical testing or spam detection. The core idea is to compare predicted classes with actual classes, then count true positives, false positives, true negatives, and false negatives. From these counts, you can calculate metrics such as accuracy, precision, recall, specificity, and F1F_1 score. These metrics answer different questions, so the best metric depends on the cost of each kind of error.

Key Facts

  • A confusion matrix organizes classification results into TPTP, FPFP, TNTN, and FNFN by comparing predicted labels with actual labels.
  • Accuracy measures the overall fraction of correct predictions and is calculated by Accuracy=TP+TNTP+TN+FP+FN\text{Accuracy} = \frac{TP + TN}{TP + TN + FP + FN}.
  • Precision measures how many predicted positives were actually positive and is calculated by Precision=TPTP+FP\text{Precision} = \frac{TP}{TP + FP}.
  • Recall, also called sensitivity, measures how many actual positives were correctly found and is calculated by Recall=TPTP+FN\text{Recall} = \frac{TP}{TP + FN}.
  • Specificity measures how many actual negatives were correctly identified and is calculated by Specificity=TNTN+FP\text{Specificity} = \frac{TN}{TN + FP}.
  • The false positive rate is calculated by FPR=FPFP+TN\text{FPR} = \frac{FP}{FP + TN}, and it equals 1Specificity1 - \text{Specificity}.
  • The false negative rate is calculated by FNR=FNFN+TP\text{FNR} = \frac{FN}{FN + TP}, and it equals 1Recall1 - \text{Recall}.
  • The F1F_1 score balances precision and recall using F1=2PrecisionRecallPrecision+RecallF_1 = \frac{2 \cdot \text{Precision} \cdot \text{Recall}}{\text{Precision} + \text{Recall}}.

Vocabulary

Confusion matrix
A table that compares predicted classes with actual classes to show correct and incorrect classification results.
True positive
A true positive, written TPTP, is a case where the model predicts positive and the actual class is positive.
False positive
A false positive, written FPFP, is a case where the model predicts positive but the actual class is negative.
False negative
A false negative, written FNFN, is a case where the model predicts negative but the actual class is positive.
Precision
Precision is the proportion of positive predictions that are correct, calculated by TPTP+FP\frac{TP}{TP + FP}.
Recall
Recall is the proportion of actual positives that are correctly identified, calculated by TPTP+FN\frac{TP}{TP + FN}.

Common Mistakes to Avoid

  • Confusing false positives with false negatives is wrong because they describe opposite error types. A false positive predicts positive when the truth is negative, while a false negative predicts negative when the truth is positive.
  • Using accuracy alone with imbalanced data is misleading because a model can look accurate by mostly predicting the majority class. For rare positives, precision, recall, and F1F_1 score often give better information.
  • Putting FPFP and FNFN in the wrong cells changes every metric that uses them. Always check whether rows represent actual labels or predicted labels before calculating.
  • Treating precision and recall as the same metric is wrong because precision focuses on positive predictions, while recall focuses on actual positives. A model can have high precision but low recall, or high recall but low precision.
  • Raising or lowering the classification threshold without checking error tradeoffs is risky because it usually changes both FPFP and FNFN. A lower threshold often increases recall but may decrease precision.

Practice Questions

  1. 1 A classifier has TP=45TP = 45, TN=40TN = 40, FP=10FP = 10, and FN=5FN = 5. Calculate the accuracy.
  2. 2 A medical test gives TP=80TP = 80, FP=20FP = 20, and FN=10FN = 10. Calculate the precision and recall.
  3. 3 A model has precision 0.750.75 and recall 0.600.60. Calculate the F1F_1 score using F1=2PrecisionRecallPrecision+RecallF_1 = \frac{2 \cdot \text{Precision} \cdot \text{Recall}}{\text{Precision} + \text{Recall}}.
  4. 4 For a disease screening test, explain whether false positives or false negatives are usually more dangerous, and justify which metric should be emphasized.