What is ROC Curve in Machine Learning?

Spread the love

In this post, we will learn What is ROC curve

ROC Curve –

The receiver operating characteristic (ROC) curve plots the true positive rate (recall) against the false positive rate. The true positive rate is the ratio of positive instances that are correctly classified as positive. The false positive rate is the ratio of negative instances that are incorrectly classified as positive.

Plot ROC Curve –

let’s read a dataset to work with.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

url = "https://raw.githubusercontent.com/bprasad26/lwd/master/data/breast_cancer.csv"
df = pd.read_csv(url)
values = {"B": 0, "M": 1}
df["diagnosis"] = df["diagnosis"].map(values)

Here, we have data about cancer patients, in which 37% of the patients are sick and 63% of the patients are healthy.

Train A Model –

Now, let’s train a SVC classifier

from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.pipeline import make_pipeline
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import StandardScaler

# split the data into training and test set
X = df.drop("diagnosis", axis=1).copy()
y = df["diagnosis"].copy()

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=26

# train a SVC model
svm_clf = make_pipeline(SimpleImputer(strategy='mean'), 
                        StandardScaler(), SVC(random_state=42))

svm_clf.fit(X_train, y_train)

Now to plot the ROC Curve we can use the RocCurveDisplay.

from sklearn.metrics import RocCurveDisplay

fig, ax = plt.subplots(figsize=(8, 6))
RocCurveDisplay.from_estimator(svm_clf, X_test, y_test, ax=ax)

Now, let’s calculate the area under the roc curve (roc auc score). The roc auc score helps us compare different classifier and chose the classifier that performs best.

from sklearn.metrics import roc_auc_score
from sklearn.model_selection import cross_val_predict

y_scores_svm = cross_val_predict(svm_clf, X_train, y_train, cv=5, method='decision_function')
roc_auc_score(y_train, y_scores_svm)

output- 0.994238683127572

The roc auc score is 0.9942. A perfect classifier will have a roc auc equal to 1, whereas a purely random classifier will have a roc auc equal to 0.5.

Now, let’s train a RandomForestClassifer and compare it with the SVM classifier.

fig, ax = plt.subplots(figsize=(8, 6))
RocCurveDisplay.from_estimator(svm_clf, X_test, y_test, ax=ax, name='SVC')
RocCurveDisplay.from_estimator(rf_clf, X_test, y_test, ax=ax, name='Random Forest')
y_scores_rf = cross_val_predict(rf_clf, X_train, y_train, cv=5, method='predict_proba')

roc_auc_score(y_train, y_scores_rf[:, 1])
output - 0.9890880127439267

The SVC classifier works little better than the random forest classifier.

1 . Confusion Matrix – How to plot and Interpret Confusion Matrix.

2 . What is Precision, Recall and the Trade-off?

Rating: 1 out of 5.

Leave a Reply