Compute Minimum and Maximum value of a Column in PySpark

Spread the love

To compute the minimum and maximum value of a column in pyspark, we use the min and max functions respectively.

Read a Dataset –

Let’s read a dataset to work with. We will use the clothing store sales data.

from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()
df = spark.read.format('csv') \
    .options(header='true', inferSchema='true') \
    .load('../data/clothing_store_sales.csv')
df.show(5)

Compute Minimum Value of a Column in PySpark –

Let’s find out the minimum value of the Age column.

from pyspark.sql.functions import min
df.select(min('Age')).show()

The minimum age is 20.

Compute Maximum Value of a Column in PySpark –

Let’s also compute the maximum value of the Age column.

from pyspark.sql.functions import max
df.select(max('Age')).show()

The maximum age is 78.

Related Posts –

  1. Count Number of Rows in a Column or DataFrame in PySpark
  2. How to Compute the Mean of a Column in PySpark?
  3. How to Compute Standard Deviation in PySpark?
  4. describe() method – Compute Summary Statistics in PySpark

Rating: 1 out of 5.

Leave a Reply