How to Train a Random Forest Regressor in Sklearn?

Spread the love

In our previous post we learned how random forest works and trained a random forest classifier. In this post we will learn how to train a random forest regressor in sklearn.

Random Forest Regressor in Sklearn –

Just like how we can make a forest of decision tree classifiers, we can make a forest of decision tree regressor where each tree uses a bootstrapped subset of observations and at each node the decision rule considers only a subset of features. As with RandomForestClassifer, we have certain important parameters.

max_features – it sets the maximum number of features to consider at each node. Defaults to np.sqrt(p) where p is the total number of features.

bootstrap – sets whether or not to sample with replacement. Defaults to True.

n_estimators – sets the number of decision trees to construct. Default to 100.

Let’s read a dataset to train a Random Forest Regressor.

import pandas as pd
import numpy as np
from sklearn import datasets

housing = datasets.fetch_california_housing()
X = pd.DataFrame(, columns=housing.feature_names)
y =

Next split the data into a training and a test set

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

Now, let’s train a Random Forest Regressor and measure the error.

from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_squared_error

# create a Random Forest Regressor model
rf = RandomForestRegressor(random_state=42)
# train it on the training set, y_train)
# make predictions on the test set
y_pred = rf.predict(X_test)
# measure error
mse = mean_squared_error(y_test, y_pred)
rmse = np.sqrt(mse)
# output

Rating: 1 out of 5.

Leave a Reply