How to Shuffle the rows of a DataFrame in Pandas

Spread the love

Problem –

You have a pandas dataframe and you want to shuffle the rows of the dataframe.

Solution –

There are various ways to shuffle the dataframe in pandas. Let’s see them one by one.

import pandas as pd
import numpy as np

url = "https://raw.githubusercontent.com/bprasad26/lwd/master/data/clothing_store_sales.csv"
df = pd.read_csv(url)
df = df.head(10)

Method 1 –

The easiest way to do that is to use the df.sample() method in pandas to select all the rows without replacement.

df1 = df.sample(frac=1)

Method 2 –

You can also shuffle the rows of the dataframe by first shuffling the index using np.random.permutation and then use that shuffled index to select the data from the dataframe.

df2 = df.iloc[np.random.permutation(len(df))]

Method 3 –

Another way to shuffle the rows of a dataframe is using scikit-learn.

from sklearn.utils import shuffle
df3 = shuffle(df)

Rating: 1 out of 5.

Leave a Reply