# How to Compute Pearson Correlation Coefficient in PySpark?

To Compute the Pearson Correlation Coefficient in PySpark, we use the corr() function.

### Syntax –

corr(column1, column2)

### Read a Dataset –

Let’s read a dataset to work with. We will use the clothing store sales data.

from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()
df = spark.read.format('csv') \
df.show(5)

### Compute Pearson Correlation Coefficient in PySpark –

Let’s compute the Pearson correlation coefficient of Net Sales and Age columns.

from pyspark.sql.functions import corr
df.select(corr("Net Sales", "Age")).show()

You can also compute it like this –

df.stat.corr("Net Sales", "Age")
#output
-0.010635891709415892

Rating: 1 out of 5.