Pandas – Select Rows and Columns from a DataFrame.

Spread the love

In this post, you will learn –

A. How to select Columns from a DataFrame

  1. Select a single column
  2. Select multiple columns

B. How to select Rows from a DataFrame

  1. Select a single row
  2. Select multiple rows

C. How to select Rows and Columns together.

  1. Select single row and a single column.
  2. Select multiple rows and multiple columns.

Reading Data –

Let’s first read a dateset to work with.

# import pandas
import pandas as pd

#read data
url="https://raw.githubusercontent.com/bprasad26/lwd/master/data/clothing_store_sales.csv"
df = pd.read_csv(url)
df.head()

A. How to select Columns from a DataFrame

1. Select a single column –

To select a single column from a DataFrame, we can use the square bracket notation. Think it as select a key from a dictionary.

# select type of customer column
df['Type of Customer']

Using loc method –

We can also use the loc method to select a column from a DataFrame. This is a hybrid method for selecting rows and columns. We will learn more about it as we go through the post.

The loc method in pandas is used to select rows and columns based on labels or names.

The general syntax of the loc method is –

df.loc[[rows], [columns]]

The part to the left of the comma is the rows that you want to select and the part to the right of the comma are the columns that you want to select.

To select the type of customer column using the loc method, we will use

# using loc method
df.loc[:,'Type of Customer']

Here, we are using a colon symbol to tell pandas that we want all of the rows from start to end and the right part tells that we want the Type of Customer column, so all the rows from the Type of Customer column will be selected.

Using iloc method –

The iloc method is similar to the loc method but instead of using labels or names, the iloc method uses the row or column numbers.

The Type of customer column is the second column in this DataFrame and to select it we have to use the value of 1 as python uses 0 based index.

# using iloc method to select Type of Customer column
df.iloc[:, 1]

You can also select columns using negative index. If you want to select the last column ‘Age’, you will write.

# select Age columns using negative index
df.iloc[:, -1]

2 . Select Multiple Columns –

To select multiple columns using bracket notation, we have to pass the names of the columns that you want to select in a list.

# select Gender and Marital Status
df[['Gender','Marital Status']]

Using loc –

To select multiple columns using loc method, you can pass names of columns in a list to the loc method.

# using loc select Gender and Marital Status
df.loc[:, ['Gender','Marital Status']]

When selecting multiple columns, instead of naming all the column names, you can also use slicing to select multiple columns.

To select all the columns from Net Sales to Age, you will write.

# select all column from Net Sales to Age using slicing
df.loc[:, 'Net Sales' : 'Age']

Note, here I am not passing the slice inside a list as we did before. Doing this will cause invalid syntax error.

Using iloc –

To select multiple columns using iloc

# select Gender and Marital Status using iloc
df.iloc[:, [5, 6]]

When using slicing, keep in mind that the end index is excluded as we do in list slicing in python compared to when we use slicing using the loc method. In loc method the end index is included.

# select all columns from Net Sales to Age using slicing in iloc
df.iloc[:, 3: 8]

The index of the column in our dataframe is 7, If I use the syntax df.iloc[:, 3: 7] , you will get all the columns from Net Sales to Marital Status instead of what we got earlier from Net Sales to Age.

df.iloc[:, 3: 7]

You can also include steps when selecting columns using slicing. To select every other columns from start to finish

df.iloc[:, ::2]

B. How to select Rows from a DataFrame –

1 . Select a single row –

To select rows from a dataframe, you can not use the square bracket notation as it is only used when selecting columns. To select rows you have to use either loc or iloc methods.

Using loc –

To select a single row from a dataframe

# select 5th row from the dataframe
df.loc[4]

Here, the row labels are integers, so you can use them to select rows, but If you try to select the last columns using negative index, pandas will throw an error as loc method is used to select data based on index label and there is no row label which has value of -1.

# will cause error.
df.loc[-1]

Using iloc –

# select 5th row from the dataframe using iloc
df.iloc[4]

You can use negative index to select a row when using iloc as this method is used to select rows and columns using index number.

# can use negative index with iloc to select a row
df.iloc[-1]

2 . Select Multiple Rows –

To select multiple rows from a dataframe , you can either use loc or iloc method and pass a list of value to select multiple rows of data.

Using loc –

# select 1st, 5th and 100th row
df.loc[[0, 4, 99]]

Using iloc –

To select multiple rows with iloc and slicing, you will use –

# select rows from 1st to 5th using iloc
df.iloc[:5]

C. Select Rows and Columns together –

Now, let’s see how to select rows and columns together.

1 . Select a single row and a column –

Using loc –

To select the 3rd row from the Net Sales column

df.loc[2, 'Net Sales']

Using iloc –

To select the above data using iloc method, you will write

df.iloc[2, 3]

2 . Select Multiple Rows and Columns –

Using loc –

To select 1st, 2nd and 3rd rows from the Net Sales, Method of payment and Gender column.

df.loc[[0,1,2], ['Net Sales','Method of Payment','Gender']]

And to use slicing, you will write –

# end index is included when using loc
df.loc[0:2, 'Net Sales': 'Gender']

Using iloc –

To get the above result using iloc , you will write

df.iloc[[0,1,2], [3, 4, 5]]

And when using slicing, the end index is excluded, so you will write

# end index is excluded when using iloc
df.iloc[0:3, 3:6]

Note – You can see there are various ways to select rows and columns from a DataFrame. To simplify things, first decide whether you want to use the loc or iloc method and then explicitly provide both the rows and columns values. This will avoid the confusion during selection and it will be also easier for other people to understand what are you trying to do.

If you like this post then please share it with others and also subscribe to our blog below.

Rating: 1 out of 5.

Leave a Reply