# How to Write a Pandas DataFrame to Parquet File?

To write a pandas dataframe to Parquet File we use the to_parquet() method in pandas.

Apache Parquet is a free and open-source column-oriented data storage format in the Apache Hadoop ecosystem. It is similar to RCFile and ORC, the other columnar-storage file formats in Hadoop, and is compatible with most of the data processing frameworks around Hadoop.

### Syntax –

DataFrame.to_parquet(path=None, engine='auto', compression='snappy', index=None, partition_cols=None, storage_options=None, **kwargs)

### Parameters –

• path: This is the path to the Parquet file.
• engine: This parameter indicates which Parquet library to use. The available options are auto, pyarrow, and fastparquet.
• compression: This parameter indicates the type of compression to use. The available options are snappy, gzip, and brotli. The default compression is snappy.
• index: This is a boolean parameter. If True, the DataFrame’s indexes are written to the file. If False, the indexes are ignored.
• partition_cols: These are the names of the columns that partition the DataFrame. The order in which the columns are given determines the order in which they are partitioned.
• storage_options: These are the extra options for a certain storage connection, such as a host, port, username, password, and so on.

## Examples –

Let’s read a dataset in pandas.

import pandas as pd

df.head()

Now before we write a dataframe to parquet file, we need to install pyarrow or fastparquet. Let’s install pyarrow using pip.

pip install pyarrow

Now, we can write a pandas dataframe to a parquet file using the to_parquet() method.

# write to parquet file
df.to_parquet("clothing_store_sales.parquet")

### Related Posts –

1. How to Read a Parquet File in Pandas?

Rating: 1 out of 5.