# How to Compute Standard Deviation in PySpark?

To compute the population standard deviation, we us the stddev_pop function in pyspark and to compute the sample standard deviation, we use the stddev_samp function.

Let’s read a dataset to illustrate it. We will use the clothing store sales data.

from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()
df.show(5)

### Population Standard deviation –

Let’s use the stddev_pop function to compute the Population standard deviation of the Age column.

from pyspark.sql.functions import stddev_pop
df.select(stddev_pop('Age')).show()

### Sample Standard deviation –

To compute the sample standard deviation, we will use the stddev_samp function.

from pyspark.sql.functions import stddev_samp
df.select(stddev_samp('Age')).show()

### Related Posts –

Rating: 1 out of 5.