How groupby() works in pandas – Split Apply Combine

Spread the love

DataFrame.groupby() –

GroupBy is a powerful way to aggregate, transform and filter data. It applies the split apply combine logic –

1 . First the data is split into separate parts using the column that we want to groupby

2 . Then a function is applied to each groups of data

3 . And then the results from each parts is combined to create a new dataset.

Let’s read a data to illustrate it

Group Data by Single column –

import pandas as pd
import numpy as np
url = "https://raw.githubusercontent.com/bprasad26/lwd/master/data/clothing_store_sales.csv"
df = pd.read_csv(url)
df.head()

Let’s say you want to find out the average sales by each card type. To do that you have to group the data by card type.

card_group = df.groupby("Method of Payment")
for name , group in card_group:
    print(name)
    print(group)
    print()

The output is truncated here but you can see that we have successfully created different groups of data.

Now, we can select the column that we are interested in and apply an aggregate function. Once we applied the aggregate function, pandas will take the result of it and combine it into a new dataframe.

df.groupby("Method of Payment")['Net Sales'].mean()

Group Data By Multiple columns –

You can also group data by multiple columns –

df.groupby(["Method of Payment","Gender"])['Net Sales'].mean()

Apply multiple functions simultaneously –

You can also apply multiple functions simultaneously using the agg or aggregate method.

df.groupby(["Method of Payment","Gender"])['Net Sales'].agg([np.mean, np.max, np.min])

You can also write it this way , both are similar.

df.groupby(["Method of Payment","Gender"])['Net Sales'].agg(['mean', 'max', 'min'])

Or you can also use the aggregate method instead of agg method like this.

df.groupby(["Method of Payment","Gender"])['Net Sales'].aggregate(['mean', 'max', 'min'])

Apply multiple functions on multiple columns simultaneously –

You can also apply multiple function on multiple column at the same time using a dictionary.

df.groupby(["Method of Payment","Gender"]).aggregate({"Net Sales": 'mean',
                                                     "Age": 'median'})

Rating: 1 out of 5.

Leave a Reply