How to Evaluate Random Forest with Out-Of-Bag Errors?

Spread the love

In Random Forest, each decision tree is trained using a bootstrapped subset of observations. This means that for every tree there is a separate subset of observations not being used to train that tree. These are called out-of-bag (OOB) observations. We can use OOB observations as a test set to evaluate the performance of our random forest.

For every observation, the learning algorithm compares the observation’s true value with the prediction from a subset of trees not trained using that observation. The overall score is calculated and provided a single measure of random forest’s performance. OOB score estimation is an alternative to cross validation.

In Scikit-learn, we can get OOB scores of a Random Forest by setting oob_score = True in the Random Forest object (i.e. RandomForestClassifier). The score can be retrieved using oob_score_.

Let’s read a dataset to illustrate it.

import pandas as pd
import numpy as np

url = 'https://raw.githubusercontent.com/bprasad26/lwd/master/data/breast_cancer.csv'
df = pd.read_csv(url)
df.head()

Next split the data into a training and test set.

from sklearn.model_selection import train_test_split

X = df.drop('diagnosis', axis=1)
y = df['diagnosis']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

Now train a random forest classifier and set the oob_score=True

from sklearn.ensemble import RandomForestClassifier

# create a random forest object
rf = RandomForestClassifier(random_state=42, oob_score=True)

# train it on training set
rf.fit(X_train, y_train)

# get the oob score
rf.oob_score_
# output
0.9516483516483516

Related Posts –

  1. A Gentle Introduction to Random Forest in Machine Learning
  2. How to Train a Random Forest Regressor in Sklearn?
  3. How to Identify Important Features of a Random Forest Model?
  4. How to Select Important Features of a Random Forest Model?

Rating: 1 out of 5.

Leave a Reply