
In our previous post we learned how to read a csv file in PySpark. In this post we will learn how to write a pyspark dataframe to a csv file.
Write PySpark DataFrame to a CSV file –
Let’s first read a csv file. We will use the titanic dataset.
from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()
df = spark.read.format('csv').option('header','true').load('../data/titanic.csv')
df.show(5)

Now, to write this dataframe to a csv file, we will write.
df.write.csv('../data/titanic1.csv')
or we can write
df.write.format('csv').save('../data/titanic2.csv')
Write a PySpark DataFrame to a csv file with Header –
By Default PySpark don’t include the headers or column names when saving a dataframe to a csv file. For this we have to use option in PySpark.
To include the headers we have to write
df.write.format('csv').option('header','true').save('../data/titanic3.csv')
Options when writing to a csv file –
We already saw the header options but there are many other options when writing to a csv file in PySpark. Which will be listed below at the end of the post.
Let’s say you want to save the dataframe as a TSV file. we can easily do this with options.
df.write.format('csv').option('header','true').option('sep','\t').save('../data/titanic.tsv')
Save Modes –
Save models specifies what will happen if spark finds data at the specified location.
append – Appends the output files to the list of files that already exist at that location.
overwrite – Will completely overwrite any data that already exists there.
errorIfExists – Throws an error and fails the write if data or files already exist at the specified location.
ignore – if data or files exist at the location, do nothing with the current dataframe.
Let’s say you want to overwrite if a file already exists.
df.write.format('csv').option('header','true').mode('overwrite').save('../data/titanic3.csv')
CSV Options –
As I said before there are many options when reading or writing a csv file in PySpark. All are listed below.





