How to Shuffle the Rows of a Pandas Dataframe

Spread the love

Problem –

You have a pandas dataframe and you want to shuffle the rows of the dataframe.

Solution –

Let’s read a dataset to illustrate.

import pandas as pd

url = 'https://raw.githubusercontent.com/bprasad26/lwd/master/data/clothing_store_sales.csv'
df = pd.read_csv(url)
df.head()

To Shuffle the rows of a pandas dataframe we can use the sample method. The sample method randomly sample n rows or fraction from a pandas dataframe. Let’s say I want to randomly sample 5 rows from a pandas dataframe, I can write.

df.sample(n=5, random_state=42)

The n parameter tells pandas how many rows to sample and random_state is used for reproducibility. Every time you run this code you will get the same result.

Now to randomly shuffle all of the rows you can either pass the length of the dataframe to n parameter or use the frac parameter to randomly sample some fraction of rows from the pandas dataframe. frac=1 will shuffle all the rows in the dataframe which is what we want.

df.sample(frac=1, random_state=42)

If you look carefully, you can see that index of the dataframe is also random like 83, 53, 70 etc. If you want to reset the index serially like 0, 1, 2, 3, etc. you can use the reset_index method.

df.sample(frac=1, random_state=42).reset_index(drop=True)

Rating: 1 out of 5.

Leave a Reply