How to Calculate Partial Correlation in Python?

Spread the love

What is Partial Correlation?

Partial correlation measures the degree of association between two random variables, with the effect of a set of controlling random variables removed. If we are interested in finding to what extent there is a numerical relationship between two variables of interest, using simple correlation will not suffice. This is because it might well be that the observed correlation is influenced by one or more additional variables not taken into account. This is where partial correlation comes into play.

How to Calculate Partial Correlation in Python?

Python does not have a built-in function to calculate partial correlation for pandas DataFrame, but we can use the pingouin package, which has a partial_corr() function. Here is how you can use it:

!pip install pingouin
import pandas as pd
import pingouin as pg

# Create a simple dataframe
df = pd.DataFrame({
   'A': [1, 2, 3, 4, 5],
   'B': [2, 3, 4, 5, 6],
   'C': [3, 2, 4, 2, 1]
})

# Compute partial correlation of A and B controlling for C
partial_corr = pg.partial_corr(data=df, x='A', y='B', covar='C')

print(partial_corr)

In this example, pg.partial_corr(data=df, x='A', y='B', covar='C') calculates the partial correlation between columns ‘A’ and ‘B’ controlling for ‘C’.

The partial_corr() function returns a DataFrame that contains the partial correlation coefficient and other statistics.

Please note that you might need to install the pingouin package using pip if you haven’t done so already. The line !pip install pingouin does that for you if you’re running your code in a Jupyter notebook. If you’re running your code elsewhere, you might need to install it outside of your Python script using pip install pingouin in your command line.

Related Posts

1. How to Calculate Correlation in Python?

Rating: 1 out of 5.

Leave a Reply