
In our previous post, we learned how to use DataFrame.dropna() method to handle missing values in python. In this post, we will learn how to use the DataFrame.fillna() method to handle missing values.
DataFrame.fillna() – Fill Missing Values –
Let’s read a dataset to work with. Here, we have some data about fruit prices.
df = pd.read_csv(
"https://raw.githubusercontent.com/bprasad26/lwd/master/data/fruit_prices.csv"
)
df

Apart from dropping missing values that we learned previously, another easiest way to handle missing values (NaN – Not a number) is to fill them with the mean, median or the mode of each column.
Let’s say that you want to fill the missing values with the median. To do that first we calculate the median for each columns.
# median of each columns
medians = df.median()
medians

Then we use these median to fill the missing value in their respective columns.
# fill missing values with median
df.fillna(medians)

If you want you can also fill all of the missing values with a same value like 0. It’s totally up to you.
# fill missing values with zero
df.fillna(0)

Pandas also let’s you fill missing values using a dictionary. Suppose you want to fill the apple and orange columns with 0s and Banana and Mango columns with -999. To do that you will write
# fill missing values using a dictionary
fill_values = {"apple": 0, "Orange": 0, "Banana": -999, "Mango": -999}
df.fillna(fill_values)

You can also fill missing values based on near by values in your dataset. You can do this using the method parameter of fillna.
By default it is None. You can use the ‘ffill’ (forward fill) to propagate the last valid observation to fill the upcoming missing values.
In our case in the original dataframe the second and third rows are all NaN, Now if we use the method=’ffill’ then all these rows will be filled with the first rows data. And all of the missing values in our dataframe will be filled this way.
df.fillna(method="ffill")

Now, if you use the method=’bfill’ then pandas will fill the second and the third row with the row that is next to them. In this case with the fourth row which has the index 3 as python counts from 0.
df.fillna(method="bfill")

And if you look carefully, you can see that the last 2 rows in the apple column are left as it is. The reason is there is no other rows or values ahead of these missing values so they left unfilled.
The fillna method also has a limit parameter that let’s you decide the maximum number of consecutive NaN values to forward/backward fill.
If I set the limit=1 and method=’ffill’ then only the second row will be filled and third row will be untouched.
df.fillna(method="ffill", limit=1)

Related Posts –
1 . Pandas – dropna() method -Handle Missing Values in python