
In this post, you will learn How to create Correlation Matrix in Pandas.
pandas.DataFrame.corr() –
What is Correlation ?
The correlation coefficient, also known as the Pearson product-moment correlation coefficient is a numerical index that reflects the relationship between two variables such as X and Y. It ranges in value from -1 to +1. If the two variables move in the same direction, then those variables are said to have a positive correlation. If they move in opposite directions, then they have a negative correlation.
How to create correlation Matrix in Pandas ?
Let’s read a dataset
import pandas as pd
from sklearn import datasets
housing = datasets.fetch_california_housing()
df = pd.DataFrame(housing.data, columns=housing.feature_names)
df.head()

To create a correlation matrix in pandas, we can use df.corr() method.
corr_matrix = df.corr()
corr_matrix

How to visualize correlation matrix in python –
To visualize correlation matrix in python, we can use matplotlib, seaborn or plotly.
Seaborn –
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline
plt.figure(figsize=(10,8))
sns.heatmap(corr_matrix)
plt.show()

You can also annotate each cells using annot parameter.
plt.figure(figsize=(10,8))
sns.heatmap(corr_matrix, annot=True)
plt.show()

You can also change the colormap
plt.figure(figsize=(10,8))
sns.heatmap(corr_matrix, annot=True, cmap='YlGnBu')
plt.show()

Plotly Python –
import plotly.express as px
fig = px.imshow(corr_matrix)
fig.show()

To annotate the cells use the text_auto parameter
fig = px.imshow(corr_matrix, text_auto=True)
fig.update_layout(height=600, width=700)
fig.show()
