How to Create a PySpark DataFrame from Pandas DataFrame?

Spread the love

In this post you will learn how to convert a pandas dataframe to a pyspark dataframe.

Create PySpark DataFrame from Pandas –

1 . Create a Pandas DataFrame –

Let’s first read a dataset into a pandas dataframe.

import pandas as pd

url = ''
pandasDf = pd.read_csv(url)

2 . Convert Pandas DataFrame to PySpark DataFrame –

Now to convert this pandas dataframe to PySpark dataframe we can use the createDataFrame(pandas_dataframe) method in PySPark. By Default PySpark infer the schema from pandas data types.

from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()

pysparkDf = spark.createDataFrame(pandasDf)

Related Posts –

  1. How to Create a DataFrame in PySpark?

Rating: 1 out of 5.

Leave a Reply