Pandas – pd.pivot_table() – How to create pivot table in pandas

Spread the love

pd.pivot_table –

Let’s read a dataset to work with

import pandas as pd
import numpy as np

df = pd.read_csv('https://raw.githubusercontent.com/bprasad26/lwd/master/data/clothing_store_sales.csv')
df.head()

Let’s say that you want to know, on average do Female spent more money than males or vice versa. To do that you can use a pivot table. You will keep the Gender in index and apply mean function on the Net Sales column.

pd.pivot_table(data=df , index= 'Gender', values='Net Sales')

You can see that on average females spent more money in shopping than males do. By default if you do not specify the values parameter then pandas will apply the aggregate function on all numeric columns.

pd.pivot_table(data=df , index= 'Gender')

Now, let’s say that you want to know do Married Females or Males spend more money than unmarried Females or Males. To further segment the data, you can use the columns parameter in pivot table.

pd.pivot_table(df, index='Gender', columns ='Marital Status', values ='Net Sales')

On average, single females tends to spend little bit more money than Married Females and Married Males tends to spend more than single Males.

By default, pandas apply mean function. But if you want to apply any other function then you can use the aggfunc parameter. Let’s say that instead of the average you want to find out the total amount of money spent by Males and Females.

pd.pivot_table(df, index='Gender', values='Net Sales', aggfunc=['sum'])

The aggfunc parameter also takes a dictionary. If you want to apply a different function to one column and another to another column then you can use a dictionary.

pd.pivot_table(df, index='Gender',aggfunc={'Net Sales': 'sum', 'Age':'mean'})

Or if you want to apply multiple aggregate function to a single column then pass them as a list.

pd.pivot_table(df, index='Gender', values='Net Sales', aggfunc=['mean','sum'])

pivot table also have a margin parameter that gives you the grand total.

pd.pivot_table(df, index='Gender', values='Net Sales', aggfunc=['mean','sum'], margins=True)

For the mean column pandas gives you the average and for the sum column it gives you the total sum.

Rating: 1 out of 5.

Leave a Reply