How to Compare Two DataFrames in Pandas?

Spread the love

The compare method in Pandas helps us compare two dataframes and show the difference.

Syntax –

DataFrame.compare(other, align_axis=1, keep_shape=False, keep_equal=False)

other – the other dataframe to compare with.

align_axis – Determine which axis to align the comparison on. Default 1.

  • 0, or ‘index’ – Resulting differences are stacked vertically
  • 1, or ‘columns’ – Resulting differences are aligned horizontally

keep_shape – If true, all rows and columns are kept. Otherwise, only the ones with different values are kept. Default is False.

keep_equal – If true, the result keeps values that are equal. Otherwise, equal values are shown as NaNs. Default False.

Examples –

Let’s create a DataFrame.

import pandas as pd

data = {'Name': ['Eleven','Steve','Lucas','Will','Max'],
       'Age': [18, 20, 20, 18, 19],
       'Marks': [99, 85, 82, 70, 80]}
df = pd.DataFrame(data)
df

Now, Let’s create another dataframe that is similar to the 1st dataframe but change some of the entries in it.

df2 = df.copy()
df2.loc[0, 'Name'] = 'Mike'
df2.loc[4, 'Marks'] = 90
df2

1 . Compare Two DataFrames Using Compare Method in Pandas –

Now, Let’s compare these two dataframes to find the difference between them.

df.compare(df2)

In the First row, we have changed the Name from Eleven to Mike which is shown here In the Name column. We also changed the Marks in the 5th row from 80 to 90 which is also shown here. We can see what is in the first dataframe( self) and what is different in the other dataframe.

2 . align_axis –

By Default Pandas stack the resulting difference horizontally. align_axis is set to 1 or columns. But if we want we can align the resulting difference vertically by setting align_axis=0 or index.

df.compare(df2, align_axis=0)

3 . Keep_equal –

We can also keep the equal values instead of NaN as shown before using the keep_equal parameter. By Default it is set to False.

df.compare(df2, keep_equal=True)

4 . Keep_shape –

To keep all original rows and columns use the keep_shape parameter. By Default it is set to False.

df.compare(df2, keep_shape=True)

5 . Keep all original rows and columns and also all original values –

To keep all original rows and columns and also all original values set both keep_shape and keep_equal to True.

df.compare(df2, keep_shape=True, keep_equal=True)

Rating: 1 out of 5.

Leave a Reply