Apply in pandas is a series as well as a dataframe method. It applies a function to each elements in the series.
Let’s read a dataset to illustrate this –
import pandas as pd import numpy as np url = "https://raw.githubusercontent.com/bprasad26/lwd/master/data/winequality-red.csv" df = pd.read_csv(url) df.head()
Apply on a series –
Let’s say that you want to change the quality column data. For all values below 5, you want to rate it ‘low’ , for 5 and 6 you want to rate ‘medium’ and for all data above 6 you want to rate it as ‘high’.
def change_quality(value): if value < 5: return 'low' elif value ==5 or value ==6: return 'medium' else: return 'High' df['quality_cat'] = df['quality'].apply(change_quality) df[['quality','quality_cat']].sample(10)
Apply on a DataFrame –
When we apply a function on a dataframe we need to pass the axis parameter.
Let’s only take a subset of data to understand what is going on. For this we will only take 5 rows from pH and alcohol.
df_new = df[['pH', 'alcohol']].head() df_new
Let’s say that we want to sum all the values in a column.
And if you want to sum the values for each rows, you will apply the sum with axis=1.
You can also use the lambda function with apply.
df_new.apply(lambda row: row['pH'] + row['alcohol'], axis=1)