How to Write a PySpark DataFrame to a CSV File ?

Spread the love

In our previous post we learned how to read a csv file in PySpark. In this post we will learn how to write a pyspark dataframe to a csv file.

Write PySpark DataFrame to a CSV file –

Let’s first read a csv file. We will use the titanic dataset.

from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()

df ='csv').option('header','true').load('../data/titanic.csv')

Now, to write this dataframe to a csv file, we will write.


or we can write


Write a PySpark DataFrame to a csv file with Header –

By Default PySpark don’t include the headers or column names when saving a dataframe to a csv file. For this we have to use option in PySpark.

To include the headers we have to write


Options when writing to a csv file –

We already saw the header options but there are many other options when writing to a csv file in PySpark. Which will be listed below at the end of the post.

Let’s say you want to save the dataframe as a TSV file. we can easily do this with options.


Save Modes –

Save models specifies what will happen if spark finds data at the specified location.

append – Appends the output files to the list of files that already exist at that location.

overwrite – Will completely overwrite any data that already exists there.

errorIfExists – Throws an error and fails the write if data or files already exist at the specified location.

ignore – if data or files exist at the location, do nothing with the current dataframe.

Let’s say you want to overwrite if a file already exists.


CSV Options –

As I said before there are many options when reading or writing a csv file in PySpark. All are listed below.

Related Posts –

  1. How to Read a CSV File into a DataFrame in PySpark ?

Rating: 1 out of 5.

Leave a Reply