Pandas – pd.concat() – How to concat dataframes in pandas

Spread the love

In this post, you will learn –

1 . What is concat in pandas?

2 . How to concat dataframes in pandas?

3 . What is the difference between merge and concat in pandas?

1 . What is concat in pandas?

Concat in pandas let’s you join multiple dataframes vertically or horizontally. Mostly people use it to join dataframe vertically on top of each other.

2 . How to concat dataframes in Pandas?

To concat dataframes in pandas, we need to use the pd.concat() method in pandas.

Let’s create a dataframe to work with.

import pandas as pd

marketing = pd.DataFrame({"Name":['Scott','Clark','King'],
                         "Salary":[2000, 1500, 1700],
                         "Dept":['Marketing','Marketing','Marketing']})

accounting = pd.DataFrame({"Name":['Smith','Allen','Samantha'],
                         "Salary":[1750, 2000, 1800],
                         "Dept":['Accounting', 'Accounting','Accounting']})

operation = pd.DataFrame({"Name":['Emily','Blake','Miller','Angelina'],
                         "Salary":[2000, 1500, 1700, 2000],
                         "Dept":['operation','operation','operation','operation']})

Marketing –

Accounting –

Operation –

Here, we have data from 3 departments and we want to join them on top of each other. To do that we will write

df = pd.concat([marketing, accounting, operation])

By default, the axis=0 or axis=index means pandas will join or concat dataframes vertically on top of each others. If you want to join horizontally then you have to set it to axis=1 or axis=’columns’.

If you look at the above result, you can see that the index labels are not showing serially . If you want to do so then pass ignore_index=True, by default it is False.

df = pd.concat([marketing, accounting, operation], ignore_index=True)

pd.Concat() also has a key parameter which let’s you create multi-Index. It makes it helpful to understand which data come from which dataframe and also can be helpful in selecting subset of the data from the resulting dataframe when you concat them.

df = pd.concat([marketing, accounting, operation],
               keys=['marketing','accounting','operation'])

Note – The keys parameter and ignore_index parameter does not works together.

3 . What the difference between merge and concat in pandas?

Let’s create a dataframe to understand the difference.

sales = pd.DataFrame({"Date":['2021-11-01', '2021-11-01', '2021-11-02','2021-11-02'],
                     "Product ID":[1, 2, 1 , 2],
                     "Branch ID": [100, 200, 100, 200 ],
                     "Price":[10, 15, 12, 20],
                     "Quantity":[120, 450, 300, 500 ]})

branch = pd.DataFrame({"Branch ID":[100, 200, 300],
                      "City":["Mumbai","Pune","Banglore"]})

sales –

branch –

Now, let’s see how concat join these tables horizontally.

concat = pd.concat([sales, branch], axis=1, join='inner')

If you look closely, you can see that 3rd row’s branch id in the left dataframe and the right dataframe are not same but still pandas join them together, which we don’t expect to happen when we do inner join.

And Now, let’s see how marge joins them horizontally.

merge = pd.merge(sales, branch, how='inner')

And if you look at the above result, you can see that pandas does the joining correctly. The branch id from the left dataframe gets join to the right branch id on the right dataframe.

So why is this happening?

The reason is pd.concat() does not apply any logic when joining dataframes. It simply glue them together either vertically or horizontally. On the other hand pd.merge() apply logic when joining dataframes together.

So If your goal is to join dataframes vertically then use pd.concat(). Think it as union in SQL.

But if your goal is to join dataframes horizontally then use pd.merge(). pd.merge do SQL like joins. To learn how pd.merge() works please read the related post below.

1 . Pandas – pd.merge() – How to merge dataframe in pandas.

Rating: 1 out of 5.

Leave a Reply