In our previous post we learned how to read a csv file in PySpark. In this post we will learn how to read a JSON file in PySpark.
Read a JSON file in PySpark –
Reading a json file in PySpark is very similar to reading a csv file. Let’s say we want to read the flights data file.
To do that we have to write.
from pyspark.sql import SparkSession spark = SparkSession.builder.getOrCreate() df = spark.read.format('json').load('../data/flight-data.json') df.show(5)
JSON options –
There are many json options when reading or writing a JSON file in PySpark. You can use options like this.
df = spark.read.format('json').option('inferSchema','true').load('../data/flight-data.json')
The complete list of options are given below.