How to Export Pandas DataFrame to an Excel File

Spread the love

Introduction

One of the many powerful features of the Python Pandas library is the ability to write data structures into a variety of file formats, including Excel files. Excel, being one of the most popular spreadsheet software, is widely used in many fields, including business, education, data analysis, and more. Hence, the ability to handle Excel files and to export data to Excel is a highly useful feature for data analysts and scientists. This article will explore how to export a Pandas DataFrame to an Excel file.

Installing Necessary Libraries

To work with Excel files, apart from Pandas, we need the openpyxl and xlsxwriter libraries. If they are not installed in your environment, you can use pip to install them:

pip install pandas openpyxl xlsxwriter

Creating a Pandas DataFrame

Before exporting to an Excel file, let’s first create a DataFrame. Here’s a simple example:

# import pandas
import pandas as pd

# create a simple dataset of people
data = {'Name': ['John', 'Anna', 'Peter', 'Linda'],
        'Country': ['USA', 'Canada', 'Germany', 'Australia'],
        'Age': [24, 36, 29, 50]}

df = pd.DataFrame(data)

# print the dataframe
print(df)

In the script above, we first import the pandas library. Then, we create a dictionary with some data and pass this dictionary to the pd.DataFrame() function to create a DataFrame. Finally, we print the DataFrame.

Exporting DataFrame to an Excel File

The Pandas library provides a function called to_excel() to save a DataFrame to an Excel file. Here’s a basic example of its usage:

df.to_excel('people.xlsx')

This will write the DataFrame df to an Excel file named people.xlsx. By default, this file will be saved in the same directory as your Python script or Jupyter notebook.

Advanced Usage and Customization

The to_excel() function provides a range of options for more advanced usage.

1. Specifying the Sheet Name

By default, the name of the sheet in the Excel file will be ‘Sheet1’. If you want to specify a different sheet name, you can do so with the sheet_name parameter:

df.to_excel('people.xlsx', sheet_name='Data')

2. Writing to Multiple Sheets

If you want to write different DataFrames to different sheets in the same Excel file, you can do so by creating an ExcelWriter object and specifying the sheet_name for each DataFrame:

with pd.ExcelWriter('people.xlsx') as writer:
    df.to_excel(writer, sheet_name='Data')
    df2.to_excel(writer, sheet_name='MoreData')

3. Excluding the Index

By default, the DataFrame’s index is saved to the Excel file. If you want to exclude it, use the index parameter:

df.to_excel('people.xlsx', index=False)

4. Excluding the Header

If you don’t want to write the DataFrame’s column names to the Excel file, use the header parameter:

df.to_excel('people.xlsx', header=False)

5. Specifying the Startrow and Startcol

By default, your DataFrame will start to write from the first cell of the Excel file. But you can specify a different starting row or column using the startrow and startcol parameters:

df.to_excel('people.xlsx', startrow=2, startcol=2)

6. Specifying the Excel Engine

The engine parameter allows you to specify the Excel writing engine. The default is automatically chosen based on the file extension, but it can be useful to specify it manually in some cases:

df.to_excel('people.xlsx', engine='openpyxl')

7. Formatting the Excel Output

Pandas integrates with the xlsxwriter library to provide a range of Excel-specific formatting options. For example, you can add conditional formatting, define custom formats, or even add charts to the Excel output.

# Create a Pandas Excel writer using XlsxWriter as the engine.
with pd.ExcelWriter('people.xlsx', engine='xlsxwriter') as writer:
    # Write the DataFrame to an Excel file.
    df.to_excel(writer, sheet_name='Data', index=False)

    # Get the xlsxwriter workbook and worksheet objects in order to set the column formats.
    workbook = writer.book
    worksheet = writer.sheets['Data']

    # Add some cell formats.
    format1 = workbook.add_format({'num_format': '#,##0.00'})
    format2 = workbook.add_format({'num_format': '0%'})

    # Set the column width and format.
    worksheet.set_column('B:B', 18, format1)

    # Close the Pandas Excel writer and output the Excel file.
    writer.save()

In this example, we’re first creating a Pandas Excel writer using the XlsxWriter engine. We then write our DataFrame to an Excel file and get the xlsxwriter objects from the dataframe writer object. After that, we define some cell formats and set them for our columns. Finally, we save and close the writer to output the Excel file.

Conclusion

This article has shown how to export a Pandas DataFrame to an Excel file, which is a common task in data analysis. The process is straightforward using the to_excel() function provided by Pandas, but the function also provides a wide range of options for customization and advanced usage.

Leave a Reply