
In this post you will learn how to convert a pandas dataframe to a pyspark dataframe.
Create PySpark DataFrame from Pandas –
1 . Create a Pandas DataFrame –
Let’s first read a dataset into a pandas dataframe.
import pandas as pd
url = 'https://raw.githubusercontent.com/bprasad26/lwd/master/data/clothing_store_sales.csv'
pandasDf = pd.read_csv(url)
pandasDf.head()

2 . Convert Pandas DataFrame to PySpark DataFrame –
Now to convert this pandas dataframe to PySpark dataframe we can use the createDataFrame(pandas_dataframe) method in PySPark. By Default PySpark infer the schema from pandas data types.
from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()
pysparkDf = spark.createDataFrame(pandasDf)
pysparkDf.show(5)
