
So far we talked about Confusion Matrix and Precision and Recall and in this post we will learn about F1 score and how to use it in python.
Related Posts –
1 . Confusion Matrix – How to plot and Interpret Confusion Matrix.
2 . What is Precision, Recall and the Trade-off?
F1 Score –
F1 Score combine both the Precision and Recall into a single metric. The F1 score is the harmonic mean of precision and recall. A classifier only gets a high F1 score if both precision and recall are high.

Calculate F1 score in Python –
Let’s read a dataset.
import pandas as pd
import numpy as np
# read data
url = "https://raw.githubusercontent.com/bprasad26/lwd/master/data/breast_cancer.csv"
df = pd.read_csv(url)
df.head()

values = {"B": 0, "M": 1}
df["diagnosis"] = df["diagnosis"].map(values)
df["diagnosis"].value_counts(normalize=True).round(2)

Here, we have data about cancer patients, in which 37% of the patients are sick and 63% of the patients are healthy. Our job is to build a model which can predict which patient is sick and which is healthy as accurately as possible.
Train a Model –
from sklearn.pipeline import make_pipeline
from sklearn.ensemble import RandomForestClassifier
from sklearn.impute import SimpleImputer
from sklearn.model_selection import train_test_split
# split the data into training and test set
X = df.drop("diagnosis", axis=1).copy()
y = df["diagnosis"].copy()
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.3, random_state=26
)
# initiate an rf classifier using a pipeline
clf = make_pipeline(
SimpleImputer(strategy="mean"), RandomForestClassifier(random_state=26)
)
# train the classifier on training data
clf.fit(X_train, y_train)
# make predictions on test data
pred = clf.predict(X_test)
Calculate F1 Score –
from sklearn.metrics import f1_score
score = f1_score(y_test, pred)
output - 0.9565217391304347