In this post, we will learn What is ROC curve

## ROC Curve –

The receiver operating characteristic (ROC) curve plots the true positive rate (recall) against the false positive rate. The true positive rate is the ratio of positive instances that are correctly classified as positive. The false positive rate is the ratio of negative instances that are incorrectly classified as positive.

#### Plot ROC Curve –

let’s read a dataset to work with.

```
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
url = "https://raw.githubusercontent.com/bprasad26/lwd/master/data/breast_cancer.csv"
df = pd.read_csv(url)
values = {"B": 0, "M": 1}
df["diagnosis"] = df["diagnosis"].map(values)
df.head()
```

Here, we have data about cancer patients, in which 37% of the patients are sick and 63% of the patients are healthy.

#### Train A Model –

Now, let’s train a SVC classifier

```
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.pipeline import make_pipeline
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import StandardScaler
# split the data into training and test set
X = df.drop("diagnosis", axis=1).copy()
y = df["diagnosis"].copy()
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.3, random_state=26
)
# train a SVC model
svm_clf = make_pipeline(SimpleImputer(strategy='mean'),
StandardScaler(), SVC(random_state=42))
svm_clf.fit(X_train, y_train)
```

Now to plot the ROC Curve we can use the RocCurveDisplay.

```
from sklearn.metrics import RocCurveDisplay
fig, ax = plt.subplots(figsize=(8, 6))
RocCurveDisplay.from_estimator(svm_clf, X_test, y_test, ax=ax)
plt.show()
```

Now, let’s calculate the area under the roc curve (roc auc score). The roc auc score helps us compare different classifier and chose the classifier that performs best.

```
from sklearn.metrics import roc_auc_score
from sklearn.model_selection import cross_val_predict
y_scores_svm = cross_val_predict(svm_clf, X_train, y_train, cv=5, method='decision_function')
roc_auc_score(y_train, y_scores_svm)
output- 0.994238683127572
```

The roc auc score is 0.9942. A perfect classifier will have a roc auc equal to 1, whereas a purely random classifier will have a roc auc equal to 0.5.

Now, let’s train a RandomForestClassifer and compare it with the SVM classifier.

```
fig, ax = plt.subplots(figsize=(8, 6))
RocCurveDisplay.from_estimator(svm_clf, X_test, y_test, ax=ax, name='SVC')
RocCurveDisplay.from_estimator(rf_clf, X_test, y_test, ax=ax, name='Random Forest')
plt.savefig('roc_curve_comparison.png')
plt.show()
```

```
y_scores_rf = cross_val_predict(rf_clf, X_train, y_train, cv=5, method='predict_proba')
roc_auc_score(y_train, y_scores_rf[:, 1])
output - 0.9890880127439267
```

The SVC classifier works little better than the random forest classifier.

### Related Posts –

1 . Confusion Matrix – How to plot and Interpret Confusion Matrix.

2 . What is Precision, Recall and the Trade-off?