Pandas – read_excel() – How to read Excel file in python

Spread the love

In this post you will learn –

1 . How to read an Excel file in pandas.

2 . How to skip rows when reading an excel file in pandas

3 . How to skip columns when reading an excel file in pandas.

4 . How to read an excel file in pandas with header.

1 . How to read an excel file in pandas –

Before we read an excel file in pandas, we need to install the openpyxl library. This is required dependency library that is need to read and write excel file in pandas.

In the Terminal on Mac or Command Line in windows, run the following command first.

pip install openpyxl

And In this post, we will work with this a sample file which you can download from here – data download link

Here, we have some sales data in multiple sheets and we want to read these data using pandas. let’s see how to do that.

First to get the name of all the sheets in the excel file, we can use the pd.ExcelFile()

sheets_names = pd.ExcelFile('reading_excel_file.xlsx').sheet_names
sheets_names

let’s say that we want to read the Purchase orders 1 data. To do that we need to pass the sheet name to the sheet_name parameter in pandas.

df = pd.read_excel('reading_excel_file.xlsx', sheet_name='Purchase Orders 1')

2 . How to skip rows when reading an excel file in pandas

If you want to skip the number of n rows from the top, you need to use the skiprows parameter.

Let’s say we want to skip the first 2 rows when reading the file.

df = pd.read_excel('reading_excel_file.xlsx', 
                   sheet_name='Purchase Orders 1',
                  skiprows=2)

We can see that top two rows has been skipped from the result.

And if you want to skip n rows from the end you can use the skipfooter parameter.

Let’s say we want to skip 3 rows from the end.

df = pd.read_excel('reading_excel_file.xlsx', 
                   sheet_name='Purchase Orders 1',
                  skipfooter=3)

3 . How to skip columns when reading an excel file in pandas.

To skip columns when reading an excel file, you can use the usecols parameter of read_excel.

let’s say I am only interested in reading the columns from Product ID to Ordered Quantity.

df = pd.read_excel('reading_excel_file.xlsx', 
                   sheet_name='Purchase Orders 1',
                  usecols='C:F')

Now, Suppose you want to select the first two columns and the last two columns. To do that you will write

df = pd.read_excel('reading_excel_file.xlsx', 
                   sheet_name='Purchase Orders 1',
                  usecols='A:B, H:I')

4 . How to read an excel file in pandas with header

Sometimes it could happen that the data that you want to work with does not starts with the first row. let’s copy the sales data to a new excel file and add some blank lines before the data to demonstrate that.

Now, if you look at the pic below, you can see that the data does not start from the first row. There are lots of blank rows which pandas fills with NaN (Not a number), and also the column names are be named as Unnamed.

If you want to avoid this then you can use the header parameter of the read_excel file.

You can tell pandas from where the header starts.

df = pd.read_excel("sales_excel.xlsx", 
                   sheet_name='Sheet1',
                  header=5)

1. Pandas read_csv() – read a csv file in Python.

2 . Pandas to_csv – write a dataframe to a csv file.

Rating: 1 out of 5.

Leave a Reply