
In our previous post we learned how to read a csv file in PySpark. In this post we will learn how to read a JSON file in PySpark.
Read a JSON file in PySpark –
Reading a json file in PySpark is very similar to reading a csv file. Let’s say we want to read the flights data file.
To do that we have to write.
from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()
df = spark.read.format('json').load('../data/flight-data.json')
df.show(5)

JSON options –
There are many json options when reading or writing a JSON file in PySpark. You can use options like this.
df = spark.read.format('json').option('inferSchema','true').load('../data/flight-data.json')
The complete list of options are given below.


