You want to create a simple baseline classification model so that you can compare it with your actual model.
In scikit Learn, you can use the DummyClassifier to create a baseline classification model.
Let’s read a dataset to work with.
import pandas as pd import numpy as np from sklearn import datasets cancer = datasets.load_breast_cancer() X = pd.DataFrame(cancer.data, columns=cancer.feature_names) y = cancer.target X.head()
Now, split the data into a training and a test set.
from sklearn.model_selection import train_test_split X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.2, random_state=42 )
Now, create a baseline model using DummyClassifier.
from sklearn.dummy import DummyClassifier # create dummy classifier dummy_clf = DummyClassifier(strategy='uniform', random_state=42) # train a model dummy_clf.fit(X_train, y_train) # get accuracy score dummy_clf.score(X_test, y_test) output - 0.5964912280701754
Here, we used the strategy=’uniform’ but you can also use other strategy like most_frequent, prior, stratified and constant. Details can be found here – strategy to use.
Now, we can create our model with which you want to compare the baseline model to understand the performance.
from sklearn.linear_model import LogisticRegression # create a logistic regression model clf = LogisticRegression(max_iter=10000, random_state=42) # train the model on training dataset clf.fit(X_train, y_train) # get accuracy score on test set clf.score(X_test, y_test) output - 0.956140350877193
The accuracy of this model is far better than the baseline model. If you want, you can also try some other model and see how performs the best and choose the one which is best.