How to filter Pandas dataframe using ‘in’ and ‘not in’ like in SQL

Spread the love

In this post you will learn how to use in and not in in pandas to filter data.

Method 1 –

Read a dataset to work with

import pandas as pd

url = "https://raw.githubusercontent.com/bprasad26/lwd/master/data/clothing_store_sales.csv"
df = pd.read_csv(url)
df = df.head()

let’s say that you want to select only rows where the Method of Payment is Discover or MasterCard.

To do that you can use pandas isin() method

cards_to_keep = ["Discover","MasterCard"]
df[df['Method of Payment'].isin(cards_to_keep)]

And if you want to implement not in in pandas, then just use the tilda (~) sign like this

df[~df['Method of Payment'].isin(cards_to_keep)]

Method 2 –

To implement in and not in in pandas, you can also use pandas query method.

# need to rename the column first 
# as it contains spaces between words
df.rename(columns={"Method of Payment":"Method_of_Payment"}, inplace=True)
df.query("Method_of_Payment in @cards_to_keep")

And for not in

df.query("Method_of_Payment not in @cards_to_keep")

Rating: 1 out of 5.

Leave a Reply