In this post, you will learn how to delete one or more columns from a pandas dataframe.
A . Using df.drop() method –
There are several ways to delete or drop columns using the drop method. Let’s look at them one by one.
# import pandas import pandas as pd # read data url="https://raw.githubusercontent.com/bprasad26/lwd/master/data/winequality-red.csv" df = pd.read_csv(url) df.head()
(1) . Dropping a single column with df.drop() –
To delete a single column from a dataframe, you can pass the name of the column directly to the df.drop() method with axis=’columns’ or axis=1 or pass the name in a list, both works. To drop a row, we use the axis=’index’ or axis=0.
# drop a single column, both works df.drop('density', axis='columns') df.drop(['density'], axis='columns')
To delete a column name permanently, you have to either assign the dataframe to a variable after calling calling the drop method or use the inplace=True parameter to drop.
# permanently delete a column from a df # Method 1 df = df.drop(['density'], axis='columns') # Method 2 df.drop(['residual sugar'], axis='columns', inplace=True)
(2). Dropping multiple columns with df.drop() –
To drop multiple columns from a pandas dataframe, just pass the list of all the column names to the drop method.
# drop multiple columns df.drop(['citric acid','chlorides','sulphates'], axis='columns')
(3). Dropping columns using df.columns –
You can also drop columns using column indexes. In python and pandas indexing starts from 0. So if you want to delete the first column, you will use 0 instead of 1.
# drop first column df.drop(df.columns,axis='columns') # drop first, second and last column df.drop(df.columns[[0, 1, -1]], axis='columns')
If you want to drop multiple columns using df.columns than you have to use double square bracket notation otherwise it will throw an error.
(4). Dropping columns using df.iloc –
df.iloc[[rows], [columns]] is used for rows and column selection based on index number. I am going to explain it in more detail in the upcoming post so make sure to subscribe to our blog.
# drop first, second, and last column df.drop(df.iloc[:,[0, 1, -1]], axis='columns')
The first part of .iloc says give me all the rows from start to end and the second part say only select the first, second and last column. So, it will delete all these columns and all of it’s rows.
If you want, you can also do slicing operation. Let’s say you want to delete from first column to fifth column.
# drop columns from 1st to 5th column df.drop(df.iloc[:5, : 5], axis='columns')
This will delete the columns which has index 0, 1, 2, 3, 4. In python the end index in list[start:end] is excluded when you do slicing, So it will not delete the column whose index is 5th.
(5). Dropping columns using df.loc –
You can also drop column using index labels instead of index number.
# drop chlorides, sulphates, alcohol columns df.drop(df.loc[:, ['chlorides','sulphates','alcohol']], axis='columns')
But when you use slicing with df.loc, there is a major difference compared to when you use df.iloc.
In df.loc the end label in list[start:end] is also included. So, if I want to delete from first column to third column, you will write –
# drop from first to third column df.drop(df.loc[:, 'fixed acidity': 'citric acid'], axis='columns')
This is a big difference and people always makes mistake, so be careful.
B. Using del statement –
Another way to delete columns from a pandas dataframe is using the del statement.
# drop a column del df['volatile acidity']
But dropping multiple columns using double bracket notation or other iloc or loc based selection is not possible. If you try to run the below code it will give you an error.
# drop multiple columns # this will give you error del df[['alcohol','quality']]
The right way is to use a for loop to delete multiple columns.
# delete using a loop cols_to_delete = ['alcohol','quality'] for col in df.columns: if col in cols_to_delete: del df[col]
C. Using df.pop() –
You can also use the pop method in pandas. The column name that is passed to the function is dropped from the dataframe and returned as series.
# delete the pH column ph = df.pop('pH') ph.head()