orderBy() and sort() – How to Sort a DataFrame in PySpark?

Spread the love

orderBy() and sort() –

To sort a dataframe in PySpark, you can either use orderBy() or sort() methods. You can sort in ascending or descending order based on one column or multiple columns. By Default they sort in ascending order.

Let’s read a dataset to illustrate it. We will use the clothing store sales data.

from pyspark.sql import SparkSession
from pyspark.sql.types import StructType, StructField, StringType, LongType, FloatType

spark = SparkSession.builder.appName('LifeWithData.com').getOrCreate()

manualSchema = StructType([
    StructField("Customer", LongType(), True),
    StructField("Type of Customer", StringType(), True),
    StructField("Items", LongType(), True),
    StructField("Net Sales", FloatType(), True),
    StructField("Method of Payment", StringType(), True),
    StructField("Gender", StringType(), True),
    StructField("Marital Status", StringType(), True),
    StructField("Age", LongType(), True)

df = spark.read.format('csv').option('header','true') \

Sort a DataFrame Using sort() method –

Let’s say you want to sort the dataframe by Net Sales in ascending order. To do that you will write

df.sort("Net Sales").show(5)

To sort a dataframe by multiple columns, just pass the name of the columns to the sort() method.

df.sort("Net Sales", "Age").show(5)

This can also be written as

from pyspark.sql.functions import col

df.sort(col("Net Sales"), col("Age")).show(5)

Sort a DataFrame using orderBy() method –

You can also use the orderBy method to sort a dataframe in ascending and descending order.

df.orderBy("Net Sales").show(5)

You can also sort by multiple columns.

df.orderBy("Net Sales", "Age").show(5)

Again you can also write it as

df.orderBy(col("Net Sales"), col("Age")).show(5)

Sort a DataFrame in ascending order –

You can explicitly specify that you want to sort a dataframe in ascending order.

df.sort(df['Net Sales'].asc()).show(5)
df.sort(col("Net Sales").asc()).show(5)
df.orderBy(col("Net Sales").asc()).show(5)

All of the above examples returns the same result.

Sort a DataFrame in Descending Order –

You can also sort a dataframe in descending order.

df.sort(col("Net Sales").desc()).show(5)
df.sort(df['Net Sales'].desc()).show(5)
df.orderBy(col("Net Sales").desc()).show(5)

All of the examples returns the same result.

Sort a DataFrame in Ascending and Descending Order –

You can also sort a dataframe in ascending and descending order simultaneously.

df.sort(col("Net Sales").desc(), col("Age").asc()).show(5)
df.sort(df['Net Sales'].desc(), df['Age'].asc()).show(5)
df.orderBy(df['Net Sales'].desc(), df['Age'].asc()).show(5)

Rating: 1 out of 5.

Leave a Reply