
To write a pandas dataframe to Parquet File we use the to_parquet() method in pandas.
Apache Parquet is a free and open-source column-oriented data storage format in the Apache Hadoop ecosystem. It is similar to RCFile and ORC, the other columnar-storage file formats in Hadoop, and is compatible with most of the data processing frameworks around Hadoop.
Syntax –
DataFrame.to_parquet(path=None, engine='auto', compression='snappy', index=None, partition_cols=None, storage_options=None, **kwargs)
Parameters –
path
: This is the path to the Parquet file.engine
: This parameter indicates which Parquet library to use. The available options areauto
,pyarrow
, andfastparquet
.compression
: This parameter indicates the type of compression to use. The available options aresnappy
,gzip
, andbrotli
. The default compression issnappy
.index
: This is a boolean parameter. IfTrue
, the DataFrame’s indexes are written to the file. IfFalse
, the indexes are ignored.partition_cols
: These are the names of the columns that partition the DataFrame. The order in which the columns are given determines the order in which they are partitioned.storage_options
: These are the extra options for a certain storage connection, such as a host, port, username, password, and so on.
Examples –
Let’s read a dataset in pandas.
import pandas as pd
url = 'https://raw.githubusercontent.com/bprasad26/lwd/master/data/clothing_store_sales.csv'
df = pd.read_csv(url)
df.head()

Now before we write a dataframe to parquet file, we need to install pyarrow or fastparquet. Let’s install pyarrow using pip.
pip install pyarrow
Now, we can write a pandas dataframe to a parquet file using the to_parquet() method.
# write to parquet file
df.to_parquet("clothing_store_sales.parquet")