Discrimination πŸ––

Agenda

In this talk we’ll go through the following topics:

  • Cutting Through the Confusion of the Confusion Matrix: Techniques for Choosing Cutoffs in Order to Translate Predictions into Binary Decisions: The Probability Threshold vs. the PPCR Approach βœ‚οΈ

  • ROC Curve: What it means and why I don’t like it 😀

  • Lift (& Gains) Curve: What is the Lift of the prediction model? πŸ‹οΈ

  • Precision-Recall: short description (Not a big fan of this one as well). 😀

Code for Performance Metrics and Curves

All interactive plots in this presentation were created with rtichoke (I am the author πŸ‘‹).

You are also invited to explore rtichoke blog for reproducible examples and some theory.

Motivation

Why using performance metrics? πŸ₯‡πŸ₯ˆπŸ₯‰

  • Compare different candidate models.
  • Selecting features.
  • Evaluate if the prediction model will do more harm than good.

Categories of Performance Metrics and Curves

  • Discrimination πŸ––: Model’s ability to separate between events and non-events.

  • Calibration βš–οΈ: Agreement between predicted probabilities and the observed outcomes.

  • Utility πŸ‘Œ: The usefulness of the model in terms of decision-making.

Green

πŸ™‚

Red

πŸ™

Green

Red

Cutting Through the Confusion of the Confusion Matrix βœ‚οΈ

Decision Tree

True Positives

Infected and Predicted as Infected - Good

πŸ’Š
🀒

False Positives

Not-Infected and Predicted as Infected - BAD

πŸ’Š
🀨

False Negatives

Infected and Predicted as Not-Infected - BAD


🀒

True Negatives

Not-Infected and Predicted as Not-Infected - GOOD


🀨

Probability Threshold:

  • Most Performance Metrics are estimated by using a Probability Threshold in order to classify each probability to Predicted Negative (Do not Treat) or Predicted Positive (Treat πŸ’Š).

  • This type of dichotomization is being used when the intervention carries a potential risk and there is a trade-off of risks between the intervention and the outcome.

Probability Threshold:

pΜ‚

0.11

0.15

0.18

0.29

0.31

0.33

0.45

0.47

0.63

0.72

Y

0

0

0

0

1

0

1

0

1

1


🀨


🀨


🀨


🀨


🀒


🀨


🀒


🀨


🀒


🀒

Low Probability Threshold:

Low Probability Threshold means that I’m worried about the outcome:

  • I’m worried about Prostate Cancer πŸ¦€
  • I’m worried about Heart Disease πŸ’”
  • I’m worried about Infection 🀒

Probability Threshold of 0.25

pΜ‚

0.11

0.15

0.18

0.29

0.31

0.33

0.45

0.47

0.63

0.72

Y

0

0

0

0

1

0

1

0

1

1

ΕΆ

0

0

0

1

1

1

1

1

1

1


🀨


🀨


🀨

πŸ’Š
🀨

πŸ’Š
🀒

πŸ’Š
🀨

πŸ’Š
🀒

πŸ’Š
🀨

πŸ’Š
🀒

πŸ’Š
🀒

TN

TN

TN

FP

TP

FP

TP

FP

TP

TP

High Probability Threshold:

High Probability Threshold means that I’m worried about the Intervention:

  • I’m worried about Biopsy πŸ’‰

  • I’m worried about Statins πŸ’Š

  • I’m worried about Antibiotics πŸ’Š

Probability Threshold of 0.55

pΜ‚

0.11

0.15

0.18

0.29

0.31

0.33

0.45

0.47

0.63

0.72

Y

0

0

0

0

1

0

1

0

1

1

ΕΆ

0

0

0

0

0

0

0

0

1

1


🀨


🀨


🀨


🀨


🀒


🀨


🀒


🀨

πŸ’Š
🀒

πŸ’Š
🀒

TN

TN

TN

TN

FN

TN

FN

TN

TP

TP

PPCR (Predicted Positives Conditional Rate):

\(\begin{aligned} \ {\scriptsize \frac{\text{TP + FP}}{\text{TP + FP + TN + FN}}}\end{aligned} = \begin{aligned} \ {\scriptsize \frac{\text{Predicted Positives}}{\text{Total Population}}}\end{aligned}\)

  • Sometimes we will classify each observation according to the ranking of the risk in order to prioritize high-risk patients regardless their absolute risk.

  • The implied assumption is that the highest risk might gain the highest benefit from the treatment and that the treatment does not carry a significant potential risk.

  • This type of dichotomization is being used when the organization face Resource Constraint. In healthcare we call it also Risk Percentile.

PPCR of 0.1

pΜ‚

0.11

0.15

0.18

0.29

0.31

0.33

0.45

0.47

0.63

0.72

R

Y

0

0

0

0

1

0

1

0

1

1

ΕΆ


🀨


🀨


🀨


🀨


🀒


🀨


🀒


🀨


🀒


🀒

PPCR of 0.1

pΜ‚

0.11

0.15

0.18

0.29

0.31

0.33

0.45

0.47

0.63

0.72

R

10

9

8

7

6

5

4

3

2

1

Y

0

0

0

0

1

0

1

0

1

1

ΕΆ

0

0

0

0

0

0

0

0

0

1


🀨


🀨


🀨


🀨


🀒


🀨


🀒


🀨


🀒

πŸ’Š
🀒

TN

TN

TN

TN

FN

TN

FN

TN

FN

TP

ROC Curve

Discrimination - Performance Curves

Curve Sens Spec PPV PPCR Lift
ROC y x
Lift x y
Precision- Recall x y
Gains y x

ROC Curve

Curve Sens Spec PPV PPCR Lift
ROC y x
Lift x y
Precision- Recall x y
Gains y x

ROC Curve

  • The most famous form of Performance Metrics Visualization

  • Displays Sensitivity (also known as True Positive Rate or Recall) on the y axis

  • Displays 1 - Specificity (also known as False Positive Rate) on the x axis.

Why I don’t like ROC Curve 😀

Why 1 - Specificity? Why not just Specificity? πŸ™ƒ

Honestly, I didn’t find anywhere why 1 - Specificity is more insightful than just Specificity.

Why I don’t like ROC Curve 😀

Why 1 - Specificity? Why not just Specificity? πŸ™ƒ

Honestly, I didn’t find anywhere why 1 - Specificity is more insightful than just Specificity.

Why I don’t like ROC Curve 😀

Sensitivity and Specificity do not respect the flow of time πŸ•°οΈ

Why I don’t like ROC Curve 😀

Sensitivity and Specificity do not respect the flow of time πŸ•°οΈ


Sensitivity: \(\begin{aligned} \ {\scriptsize \frac{\text{TP}}{\text{TP + FN}} = \text{Prob( Predicted Positive | Real Positive )}}\end{aligned}\)


Specificity: \(\begin{aligned} \ {\scriptsize \frac{\text{TN}}{\text{TN + FP}} = \text{Prob( Predicted Negative | Real Negative )} } \end{aligned}\)


We do not know the condition of the Conditional Probability: Not the number of future Real Positives nor the number of future Real Negatives.

Why I don’t like ROC Curve 😀

Sensitivity and Specificity do not respect the flow of time πŸ•°οΈ


PPV: \(\begin{aligned} \ {\scriptsize \frac{\text{TP}}{\text{TP + FP}} = \text{Prob( Real Positive | Predicted Positive )}}\end{aligned}\)


NPV: \(\begin{aligned} \ {\scriptsize \frac{\text{TN}}{\text{TN + FN}} = \text{Prob( Real Negative | Predicted Negative )} } \end{aligned}\)

We know the condition of the Conditional Probability: The number of Predicted Positives and the number of Predicted Negatives.

Why I don’t like ROC Curve 😀

You don’t care about AUROC, you care about the c-statistics

  • Generally speaking more area under a curve with two β€œGood” performance metrics means better model. Other than that, there is no context and performance metrics with no context might lead to ambiguity and bad decisions.

  • Another Curve: Precision-Recall is made of Sensitivity (Precision) and PPV (Recall). How much PRAUC is enough?

Why I don’t like ROC Curve 😀

You don’t care about AUROC, you care about the c-statistics

  • Why not calculating GAINSAUC? Or any combination of two good performance metrics? We can get with Sensitivity, Specificity, NPV, PPV 6 AUC metrics. Do they provide any meaningful insight besides a vague the more the better?

What is the AUROC of the following Models?

What is the AUROC of the following Models?

Why I don’t like ROC Curve 😀

You don’t care about AUROC, you care about the c-statistics

  • High Ink-to-information ratio 😡

  • One might suggest that the visual aspect is useful, but as human beings we are really bad at interpreting round things (That’s why pie-charts are considered to be bad practice).

  • Yet, the AUROC is valuable because of the equivalence to the c-statistics and it might provide good intuition about the performance of the model.

Why I don’t like ROC Curve 😀

You don’t care about AUROC, you care about the c-statistics

  • If you’ll take randomly one event and one non-event, the probability that the event will be estimated with higher probability is exactly the AUROC.

  • AUROC = p( pΜ‚(🀨) < pΜ‚(🀒) )

pΜ‚ Y
0.72 1 🀒
0.63 1 🀒
0.47 0 🀨
0.45 1 🀒
0.33 0 🀨
0.31 1 🀒
0.29 0 🀨
0.18 0 🀨
0.15 0 🀨
0.11 0 🀨
pΜ‚ Y
0.72 1 🀒
0.63 1 🀒
0.45 1 🀒
0.31 1 🀒
pΜ‚ Y
0.47 0 🀨
0.33 0 🀨
0.29 0 🀨
0.18 0 🀨
0.15 0 🀨
0.11 0 🀨

\(\begin{aligned} \ {\scriptsize \text{C-index} = \frac{\text{#Concordants}}{\text{#Concordants + #Noncorcodants}}}\end{aligned}\)

pΜ‚ Y
0.72 1 🀒
0.63 1 🀒
0.45 1 🀒
0.31 1 🀒
pΜ‚ pΜ‚ πŸ––
0.72 🀒 🀨 0.47
0.72 🀒 🀨 0.33
0.72 🀒 🀨 0.29
0.72 🀒 🀨 0.18
0.72 🀒 🀨 0.15
0.72 🀒 🀨 0.11
pΜ‚ Y
0.47 0 🀨
0.33 0 🀨
0.29 0 🀨
0.18 0 🀨
0.15 0 🀨
0.11 0 🀨

\(\begin{aligned} \ {\scriptsize \text{C-index} = \frac{\text{#Concordants}}{\text{#Concordants + #Noncorcodants}} }\end{aligned}\)

pΜ‚ Y
0.72 1 🀒
0.63 1 🀒
0.45 1 🀒
0.31 1 🀒
pΜ‚ pΜ‚ πŸ––
0.72 🀒 > 🀨 0.47 πŸ‘
0.72 🀒 > 🀨 0.33 πŸ‘
0.72 🀒 > 🀨 0.29 πŸ‘
0.72 🀒 > 🀨 0.18 πŸ‘
0.72 🀒 > 🀨 0.15 πŸ‘
0.72 🀒 > 🀨 0.11 πŸ‘
pΜ‚ Y
0.47 0 🀨
0.33 0 🀨
0.29 0 🀨
0.18 0 🀨
0.15 0 🀨
0.11 0 🀨

\(\begin{aligned} \ {\scriptsize \text{C-index} = \frac{\text{#Concordants}}{\text{#Concordants + #Noncorcodants}} = \frac{\text{6 +}}{\text{6 + }}}\end{aligned}\)

pΜ‚ Y
0.72 1 🀒
0.63 1 🀒
0.45 1 🀒
0.31 1 🀒
pΜ‚ pΜ‚ πŸ––
0.63 🀒 🀨 0.47
0.63 🀒 🀨 0.33
0.63 🀒 🀨 0.29
0.63 🀒 🀨 0.18
0.63 🀒 🀨 0.15
0.63 🀒 🀨 0.11
pΜ‚ Y
0.47 0 🀨
0.33 0 🀨
0.29 0 🀨
0.18 0 🀨
0.15 0 🀨
0.11 0 🀨

\(\begin{aligned} \ {\scriptsize \text{C-index} = \frac{\text{#Concordants}}{\text{#Concordants + #Noncorcodants}} = \frac{\text{6 + }}{\text{6 + }}}\end{aligned}\)

pΜ‚ Y
0.72 1 🀒
0.63 1 🀒
0.45 1 🀒
0.31 1 🀒
pΜ‚ pΜ‚ πŸ––
0.63 🀒 🀨 0.47
0.63 🀒 🀨 0.33
0.63 🀒 🀨 0.29
0.63 🀒 🀨 0.18
0.63 🀒 🀨 0.15
0.63 🀒 🀨 0.11
pΜ‚ Y
0.47 0 🀨
0.33 0 🀨
0.29 0 🀨
0.18 0 🀨
0.15 0 🀨
0.11 0 🀨

\(\begin{aligned} \ {\scriptsize \text{C-index} = \frac{\text{#Concordants}}{\text{#Concordants + #Noncorcodants}} = \frac{\text{6 + }}{\text{6 + }}}\end{aligned}\)

pΜ‚ Y
0.72 1 🀒
0.63 1 🀒
0.45 1 🀒
0.31 1 🀒
pΜ‚ pΜ‚ πŸ––
0.63 🀒 > 🀨 0.47 πŸ‘
0.63 🀒 > 🀨 0.33 πŸ‘
0.63 🀒 > 🀨 0.29 πŸ‘
0.63 🀒 > 🀨 0.18 πŸ‘
0.63 🀒 > 🀨 0.15 πŸ‘
0.63 🀒 > 🀨 0.11 πŸ‘
pΜ‚ Y
0.47 0 🀨
0.33 0 🀨
0.29 0 🀨
0.18 0 🀨
0.15 0 🀨
0.11 0 🀨

\(\begin{aligned} \ {\scriptsize \text{C-index} = \frac{\text{#Concordants}}{\text{#Concordants + #Noncorcodants}} = \frac{\text{6 + 6 +}}{\text{6 + 6 + }}}\end{aligned}\)

pΜ‚ Y
0.72 1 🀒
0.63 1 🀒
0.45 1 🀒
0.31 1 🀒
pΜ‚ pΜ‚ πŸ––
0.45 🀒 🀨 0.47
0.45 🀒 🀨 0.33
0.45 🀒 🀨 0.29
0.45 🀒 🀨 0.18
0.45 🀒 🀨 0.15
0.45 🀒 🀨 0.11
pΜ‚ Y
0.47 0 🀨
0.33 0 🀨
0.29 0 🀨
0.18 0 🀨
0.15 0 🀨
0.11 0 🀨

\(\begin{aligned} \ {\scriptsize \text{C-index} = \frac{\text{#Concordants}}{\text{#Concordants + #Noncorcodants}} = \frac{\text{6 + 6 +}}{\text{6 + 6 +}}}\end{aligned}\)

pΜ‚ Y
0.72 1 🀒
0.63 1 🀒
0.45 1 🀒
0.31 1 🀒
pΜ‚ pΜ‚ πŸ––
0.45 🀒 < 🀨 0.47 πŸ‘Ž
0.45 🀒 > 🀨 0.33 πŸ‘
0.45 🀒 > 🀨 0.29 πŸ‘
0.45 🀒 > 🀨 0.18 πŸ‘
0.45 🀒 > 🀨 0.15 πŸ‘
0.45 🀒 > 🀨 0.11 πŸ‘
pΜ‚ Y
0.47 0 🀨
0.33 0 🀨
0.29 0 🀨
0.18 0 🀨
0.15 0 🀨
0.11 0 🀨

\(\begin{aligned} \ {\scriptsize \text{C-index} = \frac{\text{#Concordants}}{\text{#Concordants + #Noncorcodants}} = \frac{\text{6 + 6 + 5 +}}{\text{6 + 6 + 6 +}}}\end{aligned}\)

pΜ‚ Y
0.72 1 🀒
0.63 1 🀒
0.45 1 🀒
0.31 1 🀒
pΜ‚ pΜ‚ πŸ––
0.31 🀒 🀨 0.47
0.31 🀒 🀨 0.33
0.31 🀒 🀨 0.29
0.31 🀒 🀨 0.18
0.31 🀒 🀨 0.15
0.31 🀒 🀨 0.11
pΜ‚ Y
0.47 0 🀨
0.33 0 🀨
0.29 0 🀨
0.18 0 🀨
0.15 0 🀨
0.11 0 🀨

\(\begin{aligned} \ {\scriptsize \text{C-index} = \frac{\text{#Concordants}}{\text{#Concordants + #Noncorcodants}} = \frac{\text{6 + 6 + 5 +}}{\text{6 + 6 + 6 +}}}\end{aligned}\)

pΜ‚ Y
0.72 1 🀒
0.63 1 🀒
0.45 1 🀒
0.31 1 🀒
pΜ‚ pΜ‚ πŸ––
0.31 🀒 < 🀨 0.47 πŸ‘Ž
0.31 🀒 < 🀨 0.33 πŸ‘Ž
0.31 🀒 > 🀨 0.29 πŸ‘
0.31 🀒 > 🀨 0.18 πŸ‘
0.31 🀒 > 🀨 0.15 πŸ‘
0.31 🀒 > 🀨 0.11 πŸ‘
pΜ‚ Y
0.47 0 🀨
0.33 0 🀨
0.29 0 🀨
0.18 0 🀨
0.15 0 🀨
0.11 0 🀨

\(\begin{aligned} \ {\scriptsize \text{C-index} = \frac{\text{#Concordants}}{\text{#Concordants + #Noncorcodants}} = \frac{\text{6 + 6 + 5 + 4}}{\text{6 + 6 + 6 + 6}}}\end{aligned}\)

pΜ‚ Y
0.72 1 🀒
0.63 1 🀒
0.45 1 🀒
0.31 1 🀒
pΜ‚ Y
0.47 0 🀨
0.33 0 🀨
0.29 0 🀨
0.18 0 🀨
0.15 0 🀨
0.11 0 🀨

\(\begin{aligned} \ {\scriptsize \text{C-index} = \frac{\text{21}}{\text{24}} = 0.875}\end{aligned}\)

Why I don’t like ROC Curve 😀

You don’t care about AUROC, you care about the c-statistics

probs <- c(0.3, 0.6, 0.4, 0.1, 0.4, 0.7, 0.3, 0.1, 0.2, 0.1)
reals <- c(0, 1, 1, 0, 1, 1, 0, 0, 1, 0)

pROC::auc(reals, probs)
Area under the curve: 0.92
probs_events <- probs[reals == 1]
probs_nonevents <- probs[reals == 0]

prop.table(
  table(
    sample(probs_events, replace = TRUE, size = 10000) >
    sample(probs_nonevents, replace = TRUE, size = 10000)
  )
)

 FALSE   TRUE 
0.0789 0.9211 

Why I don’t like ROC Curve 😀

You don’t care about AUROC, you care about the c-statistics

import numpy as np
import random

probs = np.array([0.3, 0.6, 0.4, 0.1, 0.4, 0.7, 0.3, 0.1, 0.2, 0.1])
reals = np.array([0, 1, 1, 0, 1, 1, 0, 0, 1, 0])

probs_events = probs[reals == 1]
probs_nonevents = probs[reals == 0]

event_prob_greater_than_nonevent_prob = np.greater(
  random.choices(sorted(probs_events), 
  k = 10000),
  random.choices(sorted(probs_nonevents), 
  k = 10000)
)

unique_elements, counts_elements = np.unique(
  event_prob_greater_than_nonevent_prob, return_counts=True)

counts_elements / 10000
array([0.0832, 0.9168])

Why I don’t like ROC Curve 😀

Good AUROC does not necessarily mean a Good model

AUROC shows how well your model discriminates between events and non-events given a target population.

pΜ‚ 0.11 0.15 0.18 0.29 0.31 0.33 0.45 0.47 0.63 0.72
Y 0 0 0 0 1 0 1 0 1 1
🀨 🀨 🀨 🀨 🀒 🀨 🀒 🀨 🀒 🀒

Good AUROC does not necessarily mean a Good model

This model has AUROC = 0.875, but the number is misleading:
The Target Population is not well defined.

Age 6 7 12 56 64 67 73 78 85 86
pΜ‚ 0.11 0.15 0.18 0.29 0.31 0.33 0.45 0.47 0.63 0.72
Y 0 0 0 0 1 0 1 0 1 1
πŸ§’ πŸ§’ 🀨 🀨 🀒 🀨 🀒 🀨 πŸ‘΅ πŸ‘΅

Bad AUROC does not necessarily mean a Bad model

This model has AUROC = 0.625, but the number is misleading:
The Target Population is well defined.

Age 12 56 64 67 73 78
pΜ‚ 0.18 0.29 0.31 0.33 0.45 0.47
Y 0 0 1 0 1 0
🀨 🀨 🀒 🀨 🀒 🀨
pΜ‚ Y
0.72 1 🀒
0.63 1 🀒
0.45 1 🀒
0.31 1 🀒
pΜ‚ Y
0.47 0 🀨
0.33 0 🀨
0.29 0 🀨
0.18 0 🀨
0.15 0 🀨
0.11 0 🀨

\(\begin{aligned} \ {\scriptsize \text{C-index} = \frac{\text{21}}{\text{24}} = 0.875}\end{aligned}\)

pΜ‚ Y
0.72 1 🀒
0.63 1 🀒
0.45 1 🀒
0.31 1 🀒
pΜ‚ Y
0.47 0 🀨
0.33 0 🀨
0.29 0 🀨
0.18 0 🀨
0.15 0 🀨
0.11 0 🀨

\(\begin{aligned} \ {\scriptsize \text{C-index} = \frac{\text{21}}{\text{24}} = 0.875}\end{aligned}\)

pΜ‚ Y
0.72 1 🀒
0.63 1 🀒
0.45 1 🀒
0.31 1 🀒
pΜ‚ Y
0.47 0 🀨
0.33 0 🀨
0.29 0 🀨
0.18 0 🀨
0.15 0 🀨
0.11 0 🀨

\(\begin{aligned} \ {\scriptsize \text{C-index} = \frac{\text{5}}{\text{8}} = 0.625}\end{aligned}\)

Lift (& Gains) Curve

Lift Curve

Curve Sens Spec PPV PPCR Lift
ROC y x
Lift x y
Precision- Recall x y
Gains y x

Lift Curve

\(\begin{aligned} \text{Lift} = \frac{\text{PPV}}{\text{Prevalence}} = \frac{\cfrac{\text{TP}}{\text{TP + FP}}}{\cfrac{\text{TP + FN}}{\text{TP + FP + TN + FN}}} \end{aligned}\)

Lift Curve

  • Lift is the ratio between the PPV and the Prevalence.

  • Lift Curve displays Lift on the Y axis and PPCR (Predicted Positives Conditional Rate) on the X axis.

  • In other words, lift shows how much the prediction is doing better than a random guess in terms of PPV.

  • The reference line stands for a random guess: the Lift is equal to 1 (PPV = Prevalence), the Sensitivity depends on the Probability Threshold or PPCR.

  • The Curve is not defined if there are no Predicted Positives (probability threshold is too high or PPCR = 0).

Precision-Recall

Curve Sens Spec PPV PPCR Lift
ROC y x
Lift x y
Precision- Recall x y
Gains y x

Prevalence

Predicted
Positives
Predicted
Negatives

Real
Positives

4
(0.4%)

Real
Negatives

6
(0.6%)
10
(1%)

\[\frac{\sum \text{Real-Positives}}{\sum \text{Observations}} = \frac{4}{10}\]

1 1 0 1 0 1 0 0 0 0
🀒 🀒 🀨 🀒 🀨 🀒 🀨 🀨 🀨 🀨

PPCR

Predicted
Positives
Predicted
Negatives

Real
Positives

Real
Negatives

3
(0.3%)
7
(0.7%)
10
(1%)

\[\frac{\sum \text{Predicted-Positives}}{\sum \text{Observations}} = \frac{3}{10}\]

1 1 1 0 0 0 0 0 0 0
πŸ’Š πŸ’Š πŸ’Š
😷 😷 😷 😷 😷 😷 😷 😷 😷 😷

Predicted
Positives
Predicted
Negatives

Real
Positives

0
(0%)
4
(0.4%)
4
(0.4%)

Real
Negatives

0
(0%)
6
(0.6%)
6
(0.6%)
0
(0%)
10
(1%)
10
(1%)

Predicted
Positives
Predicted
Negatives

Real
Positives

1
(0.1%)
3
(0.3%)
4
(0.4%)

Real
Negatives

0
(0%)
6
(0.6%)
6
(0.6%)
1
(0.1%)
9
(0.9%)
10
(1%)

Predicted
Positives
Predicted
Negatives

Real
Positives

2
(0.2%)
2
(0.2%)
4
(0.4%)

Real
Negatives

0
(0%)
6
(0.6%)
6
(0.6%)
2
(0.2%)
8
(0.8%)
10
(1%)

Predicted
Positives
Predicted
Negatives

Real
Positives

2
(0.2%)
2
(0.2%)
4
(0.4%)

Real
Negatives

1
(0.1%)
5
(0.5%)
6
(0.6%)
3
(0.3%)
7
(0.7%)
10
(1%)

Predicted
Positives
Predicted
Negatives

Real
Positives

3
(0.3%)
1
(0.1%)
4
(0.4%)

Real
Negatives

1
(0.1%)
5
(0.5%)
6
(0.6%)
4
(0.4%)
6
(0.6%)
10
(1%)

Predicted
Positives
Predicted
Negatives

Real
Positives

3
(0.3%)
1
(0.1%)
4
(0.4%)

Real
Negatives

2
(0.2%)
4
(0.4%)
6
(0.6%)
5
(0.5%)
5
(0.5%)
10
(1%)

Predicted
Positives
Predicted
Negatives

Real
Positives

4
(0.4%)
0
(0%)
4
(0.4%)

Real
Negatives

2
(0.2%)
4
(0.4%)
6
(0.6%)
6
(0.6%)
4
(0.4%)
10
(1%)

Predicted
Positives
Predicted
Negatives

Real
Positives

4
(0.4%)
0
(0%)
4
(0.4%)

Real
Negatives

3
(0.3%)
3
(0.3%)
6
(0.6%)
7
(0.7%)
3
(0.3%)
10
(1%)

Predicted
Positives
Predicted
Negatives

Real
Positives

4
(0.4%)
0
(0%)
4
(0.4%)

Real
Negatives

4
(0.4%)
2
(0.2%)
6
(0.6%)
8
(0.8%)
2
(0.2%)
10
(1%)

Predicted
Positives
Predicted
Negatives

Real
Positives

4
(0.4%)
0
(0%)
4
(0.4%)

Real
Negatives

5
(0.5%)
1
(0.1%)
6
(0.6%)
9
(0.9%)
1
(0.1%)
10
(1%)

Predicted
Positives
Predicted
Negatives

Real
Positives

4
(0.4%)
0
(0%)
4
(0.4%)

Real
Negatives

6
(0.6%)
0
(0%)
6
(0.6%)
10
(1%)
0
(0%)
10
(1%)

Gains Curve

Curve Sens Spec PPV PPCR Lift
ROC y x
Lift x y
Precision- Recall x y
Gains y x

Gains Curve

  • Gains Curve displays Sensitivity on the y axis and PPCR on the x axis.

  • Gains shows the Sensitivity for a given PPCR.

  • Reference Line for a Random Guess: The sensitivity is equal to the proportion of predicted positives.

  • Reference Line for a Perfect Prediction: All Predicted Positives are Real Positives until there are no more Real Positives (PPCR = Prevalence, Sensitivity = 1).

Precision Recall

Curve Sens Spec PPV PPCR Lift
ROC y x
Lift x y
Precision- Recall x y
Gains y x

Precision Recall Curve

  • Precision-Recall Curve displays PPV on the y axis and Sensitivity on the x axis.

  • The reference line stands for a random guess: the PPV is equal to the Prevalence, the Sensitivity depends on the Probability Threshold or PPCR.

  • The Curve is not defined if there are no Predicted Positives (probability threshold is too high or PPCR = 0).

Main Takeaways

  • Think carefully about the problem you are trying to solve when you translate probabilities to binary predictions:
    Treatment Harm? Use Probability Threshold. βœ‚οΈ
    Resource Constraint? Use PPCR.

  • Watch out for unusual interpretations when examining sensitivity, specificity, or the ROC curveβ€”they all move backward in timeπŸ•ΊπŸ•°

  • The AUROC is equivalent to the C-Index in the binary case. It might provide some intuition about the performance of the model but don’t take the numbers too literally πŸ‘΅

  • Lift Curve is useful when facing resource constraint πŸ‹οΈ