How to Create Correlation Matrix in Pandas Python

Spread the love

In this post, you will learn How to create Correlation Matrix in Pandas.

pandas.DataFrame.corr() –

What is Correlation ?

The correlation coefficient, also known as the Pearson product-moment correlation coefficient is a numerical index that reflects the relationship between two variables such as X and Y. It ranges in value from -1 to +1. If the two variables move in the same direction, then those variables are said to have a positive correlation. If they move in opposite directions, then they have a negative correlation.

How to create correlation Matrix in Pandas ?

Let’s read a dataset

import pandas as pd
from sklearn import datasets
housing = datasets.fetch_california_housing()
df = pd.DataFrame(housing.data, columns=housing.feature_names)
df.head()

To create a correlation matrix in pandas, we can use df.corr() method.

corr_matrix = df.corr()
corr_matrix

How to visualize correlation matrix in python –

To visualize correlation matrix in python, we can use matplotlib, seaborn or plotly.

Seaborn –

import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline

plt.figure(figsize=(10,8))
sns.heatmap(corr_matrix)
plt.show()

You can also annotate each cells using annot parameter.

plt.figure(figsize=(10,8))
sns.heatmap(corr_matrix, annot=True)
plt.show()

You can also change the colormap

plt.figure(figsize=(10,8))
sns.heatmap(corr_matrix, annot=True, cmap='YlGnBu')
plt.show()

Plotly Python –

import plotly.express as px
fig = px.imshow(corr_matrix)
fig.show()

To annotate the cells use the text_auto parameter

fig = px.imshow(corr_matrix, text_auto=True)
fig.update_layout(height=600, width=700)
fig.show()

Rating: 1 out of 5.

Leave a Reply