How to Convert a PySpark DataFrame to Pandas?

Spread the love

In this post you will learn how to convert a PySpark DataFrame to Pandas DataFrame.

1 . Create a PySpark DataFrame –

Let’s first create a PySpark dataframe.

from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()

pysparkDf = spark.read.format('csv').option('header','true').load('../data/clothing_store_sales.csv')
pysparkDf.show(5)

2. Convert PySpark DataFrame to Pandas DataFrame –

To convert a PySpark DataFrame to pandas we can use the toPandas() method.

pandasDf = pysparkDf.toPandas()
pandasDf.head(5)

Related Posts –

  1. How to Create a PySpark DataFrame from Pandas DataFrame?
  2. How to Create a DataFrame in PySpark?
  3. How to Read a CSV File into a DataFrame in PySpark ?
  4. How to Read a JSON File into a DataFrame in PySpark ?

Leave a Reply