Raise to the Power of a Column in PySpark

Spread the love

To raise to the power of a column in PySpark, we can use the pow() function. This function can helps us find the square value of a column, the cube of a column , square root and cube root of a column in pyspark.

Syntax –

pow(col, n)

col – Name of the column

n – Raised power

Read a Dataset –

Let’s read a dataset to illustrate it. We will use the clothing store sales data.

from pyspark.sql import SparkSession

spark = SparkSession.builder.getOrCreate()

df = spark.read.format('csv') \
    .options(header='true', inferSchema='true') \
    .load('../data/clothing_store_sales.csv')
df.show(5)

Square of a column in PySpark –

To calculate the square of a column, we will pass the name of the column and 2 as the argument to the pow() function. Let’s take the square of the Age column.

from pyspark.sql.functions import pow, col

df.select("*", pow(col("Age"), 2).alias('Age_Square')).show(5)

Cube of a column in PySpark –

To calculate the cube of a column, we will pass the name of the column and 3 as the argument to the pow() function.

df.select("*", pow(col("Age"), 3).alias('Age_Cube')).show(5)

Square Root of a column in PySpark –

To calculate the square root of a column, we will pass the name of the column and 1/2 as the argument to the pow() function.

df.select("*", pow(col("Age"), 1/2).alias('Age_Square_Root')).show(5)

Cube Root of a Column in PySpark –

To calculate the cube root of a column, we will pass the name of the column and 1/3 as the argument to the pow() function.

df.select("*", pow(col("Age"), 1/3).alias('Age_Cube_Root')).show(5)

Rating: 1 out of 5.

Leave a Reply