What is ROC curve and when to avoid it?

Haneul Kim
4 min readSep 12, 2021
Photo by Haneul Kim

Table of Contents

1. Introduction

2. What is ROC curve and AUROC?

3. When to avoid ROC curve

Introduction

Binary classification is strong tool in machine learning as most problems are binary or can be re-formulated into binary classification problems. Ability to understand how your model performed and model debugging ability is as important if not more than building the model itself. We will talk about how to measure performance of binary classification problems, more specifically following topics will be covers: ROC curve, area under ROC curve, Precision-recall curve, and average precision.

In previous blog we’ve briefly covered ROC curve and AUC. Today we will dive deeper to understand when it should be used and when it shouldn’t be used. Readers are assumed to be familiar with concepts such as confusion matrix, precision, recall, true positive rate, false positive rate, etc… If not, reading previous blog is recommended.

ROC curve and AUROC

  • ROC curve is a curve that is used to show performance at different threshold points.
  • plot of true positive rate(y-axis) Vs. false positive rate(x-axis)

In logistic regression it can output probability of classifying new label as positive(class 1) and if probability is above the threshold point then you predict positive class else negative class. For example if threshold is 20% more labels will be classified as positive leading to increase in recall and decrease in precision.

We code ROC curve from scratch, first import libraries and preprocess data.

One trick that I started using is changing all the datatypes appropriately to reduce memory, this not only allow your computer to carry larger dataframes however it having right datatypes make computation more efficient.

To plot ROC curve we need to measure TPR and FPR at each threshold point. In order to make predictions with different threshold, instead of predicting labels we need to obtain probabilities of new data belonging to each label which can be obtained by predict_proba() .

result_df looks like:

Where each column 0,1 represent model’s probability estimate of belonging to class 0,1 respectively. Now we will make prediction on multiple threshold points so if threshold is set to 15% first two rows will predict data as class1. If threshold is at 20% all rows will predict as class0.

ConfusionMatrix is class I’ve coded to make code more readable and simpler.

ROC curve generated using following code however it can more easily be done using roc_curve method offered my sklearn.metricsmodule, there’s no need to calculate TPR and FPR when using sklearn.

Plotting this give us something like this: We could plot multiple ROC curve for different models to compare them, closer the curve is to upper-left implies better performance.

If more models need to be compared with each other plotting multiple roc curve made not be most interpretable way. For this we compute Area Under the ROC curve(AUROC) to easily compare multiple curves.

When to avoid ROC curve

ROC should be used when improving False Negative and False Positive are equally important and should be avoided when data is highly imbalanced. Having large number of true negative lowers FPR considerably therefore your model may look as if it is performing well when it is not.

FPR = FP / (FP + TN). When True(class1) are rare like in cancer detection due to most people being benign(class0) we have very high TN compared to FP. So even though even though there may be 50% or more FP large number of TN will make your roc curve to stay on the left side of the plot.

When highly imbalanced dataset alternative metric is Precision-Recall curve which is similar to ROC curve however this time TPR(recall) goes in X-axis and Precision goes on y-axis. Similar to ROC curve we can calculate Area Under PR curve, referred to as average precision to compare multiple models.

Conclusion

In conclusion, having firm understanding about different performance metrics will save you from going in the wrong direction. Also it makes debugging easier and more enjoyable.

If you would like to learn more about why PR curve is preferred, I highly recommend reading The PR plot is more informative than ROC plot when evaluating Binary Classifiers on Imbalanced Datasets which extensively covers the topic.

--

--

Haneul Kim

Data Scientist passionate about helping the environment.