How to Read a JSON File into a DataFrame in PySpark ?

Spread the love

In our previous post we learned how to read a csv file in PySpark. In this post we will learn how to read a JSON file in PySpark.

Read a JSON file in PySpark –

Reading a json file in PySpark is very similar to reading a csv file. Let’s say we want to read the flights data file.

To do that we have to write.

from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()

df = spark.read.format('json').load('../data/flight-data.json')
df.show(5)

JSON options –

There are many json options when reading or writing a JSON file in PySpark. You can use options like this.

df = spark.read.format('json').option('inferSchema','true').load('../data/flight-data.json')

The complete list of options are given below.

Leave a Reply