Pandas – Delete one or more columns from a dataframe.

Spread the love

In this post, you will learn how to delete one or more columns from a pandas dataframe.

A . Using df.drop() method –

There are several ways to delete or drop columns using the drop method. Let’s look at them one by one.

# import pandas 
import pandas as pd

# read data
df = pd.read_csv(url)

(1) . Dropping a single column with df.drop() –

To delete a single column from a dataframe, you can pass the name of the column directly to the df.drop() method with axis=’columns’ or axis=1 or pass the name in a list, both works. To drop a row, we use the axis=’index’ or axis=0.

# drop a single column, both works
df.drop('density', axis='columns')
df.drop(['density'], axis='columns')

To delete a column name permanently, you have to either assign the dataframe to a variable after calling calling the drop method or use the inplace=True parameter to drop.

# permanently delete a column from a df
# Method 1
df = df.drop(['density'], axis='columns')

# Method 2
df.drop(['residual sugar'], axis='columns', inplace=True)

(2). Dropping multiple columns with df.drop() –

To drop multiple columns from a pandas dataframe, just pass the list of all the column names to the drop method.

# drop multiple columns
df.drop(['citric acid','chlorides','sulphates'], axis='columns')

(3). Dropping columns using df.columns –

You can also drop columns using column indexes. In python and pandas indexing starts from 0. So if you want to delete the first column, you will use 0 instead of 1.

# drop first column

# drop first, second and last column
df.drop(df.columns[[0, 1, -1]], axis='columns')

If you want to drop multiple columns using df.columns than you have to use double square bracket notation otherwise it will throw an error.

(4). Dropping columns using df.iloc[] –

df.iloc[[rows], [columns]] is used for rows and column selection based on index number. I am going to explain it in more detail in the upcoming post so make sure to subscribe to our blog.

# drop first, second, and last column
df.drop(df.iloc[:,[0, 1, -1]], axis='columns')

The first part of .iloc says give me all the rows from start to end and the second part say only select the first, second and last column. So, it will delete all these columns and all of it’s rows.

If you want, you can also do slicing operation. Let’s say you want to delete from first column to fifth column.

# drop columns from 1st to 5th column
df.drop(df.iloc[:5, : 5], axis='columns')

This will delete the columns which has index 0, 1, 2, 3, 4. In python the end index in list[start:end] is excluded when you do slicing, So it will not delete the column whose index is 5th.

(5). Dropping columns using df.loc[] –

You can also drop column using index labels instead of index number.

# drop chlorides, sulphates, alcohol columns
df.drop(df.loc[:, ['chlorides','sulphates','alcohol']], axis='columns')

But when you use slicing with df.loc[], there is a major difference compared to when you use df.iloc[].

In df.loc[] the end label in list[start:end] is also included. So, if I want to delete from first column to third column, you will write –

# drop from first to third column
df.drop(df.loc[:, 'fixed acidity': 'citric acid'], axis='columns')

This is a big difference and people always makes mistake, so be careful.

B. Using del statement –

Another way to delete columns from a pandas dataframe is using the del statement.

# drop a column
del df['volatile acidity']

But dropping multiple columns using double bracket notation or other iloc or loc based selection is not possible. If you try to run the below code it will give you an error.

# drop multiple columns
# this will give you error
del df[['alcohol','quality']]

The right way is to use a for loop to delete multiple columns.

# delete using a loop
cols_to_delete = ['alcohol','quality']

for col in df.columns:
    if col in cols_to_delete:
        del df[col]

C. Using df.pop() –

You can also use the pop method in pandas. The column name that is passed to the function is dropped from the dataframe and returned as series.

# delete the pH column 
ph = df.pop('pH')

Rating: 1 out of 5.

Leave a Reply