# How to Create a Baseline Regression Model in scikit Learn

## Problem –

You want to compare a simple baseline regression model to compare against your actual model.

## Solution –

In Scikit Learn, you can use the DummyRegressor to create a simple baseline model.

``````import pandas as pd
from sklearn import datasets

housing = datasets.fetch_california_housing()
X = pd.DataFrame(housing.data, columns=housing.feature_names)
y = housing.target
``````y
output -
array([4.526, 3.585, 3.521, ..., 0.923, 0.847, 0.894])``````

Then split the dataset into a training and a test set.

``````from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42)``````

Now, create a baseline model using DummyRegressor.

``````from sklearn.dummy import DummyRegressor
from sklearn.metrics import mean_squared_error
import numpy as np

# create a dummy regressor
dummy_reg = DummyRegressor(strategy='mean')
# fit it on the training set
dummy_reg.fit(X_train, y_train)
# make predictions on the test set
y_pred = dummy_reg.predict(X_test)

# calculate root mean squared error
mse = mean_squared_error(y_test, y_pred)
rmse = np.sqrt(mse)
print("Dummy RMSE:", rmse)

output -
Dummy RMSE: 1.1448563543099792``````

Now, we can create our actual model to compare with it.

``````from sklearn.linear_model import LinearRegression

# create a linear regression model
lin_reg = LinearRegression()
# fit on the training data
lin_reg.fit(X_train, y_train)
# make predictions on the test set
y_pred = lin_reg.predict(X_test)

# calculate root mean squared error
mse = mean_squared_error(y_test, y_pred)
lin_rmse = np.sqrt(mse)
print("Linear Regression RMSE:", lin_rmse)

output -
Linear Regression RMSE: 0.7455813830127761``````

If you want you can change the strategy from mean to others like median, quantile and constant. By default it is mean.

``````from sklearn.dummy import DummyRegressor
from sklearn.metrics import mean_squared_error
import numpy as np

# create a dummy regressor
dummy_reg = DummyRegressor(strategy='constant', constant=1)
# fit it on the training set
dummy_reg.fit(X_train, y_train)
# make predictions on the test set
y_pred = dummy_reg.predict(X_test)

# calculate root mean squared error
mse = mean_squared_error(y_test, y_pred)
rmse = np.sqrt(mse)
print("Dummy Constant RMSE:", rmse)

output -
Dummy Constant RMSE: 1.5567403478625699``````

Rating: 1 out of 5.