In this post, you will learn how to use crosstab in pandas. There are various ways to use it, and we will cover them one by one.
First, let’s read a dataset.
import pandas as pd url = "https://raw.githubusercontent.com/bprasad26/lwd/master/data/online_shoppers.csv" df = pd.read_csv(url) df.head()
Here, we have some data about online shoppers. And let’s say you want to find out how many new and returning visitors came to the website during each months. To do that you can use pd.crosstab()
pd.crosstab(index= df['Month'], columns= df['VisitorType'])
In August, 55 new visitors and 252 returning visitors came to the website. If you want, you can also see the data in percentage using the normalize=True. You can normalize the data by row, column and over all values. Let’s normalize by row or index.
pd.crosstab(df['Month'], df['VisitorType'], normalize='index').round(2)
So, in August 18% of the visitors are new visitors and 82% are returning visitors.
Let’s say instead of using counts you want to use some other functions like mean. You can do this using aggfunc parameter.
pd.crosstab(index= df['Month'], columns= df['VisitorType'], values= df['ProductRelated_Duration'], aggfunc='mean').fillna(0).round(2)
To calculate the grand total by rows and columns, you can use the margins parameter. By default margin name is All.
pd.crosstab(index= df['Month'], columns= df['VisitorType'], margins=True, margins_name='Total')