
To calculate variance in pandas, we use the var() method. In statistics, variance is simply the sum of squares divided by the number of observations. Variance is also called the average dispersion.
Examples –
Let’s create a dataset to work with.
import pandas as pd
data = {'Apple':[89, 89, 90, 110, 125, 84, 131, 123, 123, 140, 145, 145],
'Orange': [46, 46, 50, 65, 63, 48, 110, 120, 60, 42, 47, 62],
'Banana': [26, 30, 30, 25, 38, 22, 22, 36, 20, 27, 23, 34 ],
'Mango': [80, 80, 90, 125, 130, 150, 140, 140, 135, 135, 80, 90]}
index = ['Jan','Feb','Mar','Apr','May','June','Jul','Aug','Sep','Oct','Nov','Dec']
df = pd.DataFrame(data, index=index)
df

1 . Calculate the variance of a column –
You can calculate the variance of a single column like this
df['Apple'].var()
#output
532.3333333333335
or you can calculate the variance of all the columns like this
df.var()
#output
Apple 532.333333
Orange 649.113636
Banana 34.750000
Mango 774.810606
dtype: float64
2 . Calculate the Variance of the rows –
To calculate the variance of the rows, we need to set the axis parameter to axis=1 or columns.
df.var(axis=1)
Jan 864.250000
Feb 776.916667
Mar 900.000000
Apr 2056.250000
May 2084.666667
June 3080.000000
Jul 2914.250000
Aug 2178.250000
Sep 2931.000000
Oct 3578.000000
Nov 2802.250000
Dec 2244.916667
dtype: float64
3 . Degrees of Freedom –
By default the Variance is normalized by N-1. But you can change it to N by setting the ddof=0.
df.var(ddof=0)
#output
Apple 487.972222
Orange 595.020833
Banana 31.854167
Mango 710.243056
dtype: float64