
In our previous post we learned how to read a JSON file in PySpark. In this post we will learn how to write a PySpark dataframe to a JSON file.
Write a PySpark DataFrame to a JSON File –
Writing JSON file is just as simple as reading them. Let’s first read a dataset to work with. We will use the flights data file.
from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()
df = spark.read.format('json').load('../data/flight-data.json')
df.show(5)

Now to write this dataframe to a JSON file, we have to write.
df.write.format('json').save('../data/flights1.json')
Save Modes –
Save models specifies what will happen if spark finds data at the specified location.
append – Appends the output files to the list of files that already exist at that location.
overwrite – Will completely overwrite any data that already exists there.
errorIfExists – Throws an error and fails the write if data or files already exist at the specified location.
ignore – if data or files exist at the location, do nothing with the current dataframe.
Let’s say you want to overwrite if a file already exists.
df.write.format('json').mode('overwrite').save('../data/flights1.json')
JSON Options –
There are various options when reading or writing JSON files in PySpark. You can use options like this.
df = spark.read.format('json').option('inferSchema','true').load('../data/flight-data.json')
The complete list of options that is available are given below.



Related Posts –
- How to Read a JSON File into a DataFrame in PySpark ?
- How to Read a CSV File into a DataFrame in PySpark ?
- How to Write a PySpark DataFrame to a CSV File ?