How to Write a PySpark DataFrame to a JSON File?

Spread the love

In our previous post we learned how to read a JSON file in PySpark. In this post we will learn how to write a PySpark dataframe to a JSON file.

Write a PySpark DataFrame to a JSON File –

Writing JSON file is just as simple as reading them. Let’s first read a dataset to work with. We will use the flights data file.

from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()
df = spark.read.format('json').load('../data/flight-data.json')
df.show(5)

Now to write this dataframe to a JSON file, we have to write.

df.write.format('json').save('../data/flights1.json')

Save Modes –

Save models specifies what will happen if spark finds data at the specified location.

append – Appends the output files to the list of files that already exist at that location.

overwrite – Will completely overwrite any data that already exists there.

errorIfExists – Throws an error and fails the write if data or files already exist at the specified location.

ignore – if data or files exist at the location, do nothing with the current dataframe.

Let’s say you want to overwrite if a file already exists.

df.write.format('json').mode('overwrite').save('../data/flights1.json')

JSON Options –

There are various options when reading or writing JSON files in PySpark. You can use options like this.

df = spark.read.format('json').option('inferSchema','true').load('../data/flight-data.json')

The complete list of options that is available are given below.

Related Posts –

  1. How to Read a JSON File into a DataFrame in PySpark ?
  2. How to Read a CSV File into a DataFrame in PySpark ?
  3. How to Write a PySpark DataFrame to a CSV File ?

Rating: 1 out of 5.

Leave a Reply