In this post, you will learn how to bin numerical data into categorical one using pandas pd.cut() method.
Let’s read a dataset to work with.
import pandas as pd df = pd.read_csv("https://raw.githubusercontent.com/bprasad26/lwd/master/data/titanic.csv") df.head()
let’s also drop the rows where age data is missing.
For more information – How to handle missing values in python.
To do the binning, we need to know the minimum and maximum value of the column that we want to bin.
Now, let’s say that we want to convert the Age column from numerical to categorical, you want to bin the age data into different groups. You want to create a bin of 0 to 14, 15 to 24, 25 to 64 and 65 and above.
# create bins bins = [0, 14, 24, 64, 100] # create a new age column df['AgeCat'] = pd.cut(df['Age'], bins) df['AgeCat']
Here, the parenthesis means that the side is open i.e. the number is not included in this bin and the square bracket means that the side is closed i.e. the number is included in this bin.
You can also change which side is close by the right parameter. If you want to close the left side then pass right=False
pd.cut(df['Age'], bins, right=False)
You can also name the bins by passing the names in a list to the labels parameter.
bins = [0, 14, 24, 64, 100] bin_labels = ['Children','Youth','Adults','Senior'] df['AgeCat'] = pd.cut(df['Age'], bins=bins, labels=bin_labels)
Since this is a categorical data, you can also use value_counts method to count the number of data points in each bins.
Or if you want to see in term of percentage, you can do this by setting normalize =True