How to Calculate Variance in Python Pandas ?

Spread the love

To calculate variance in pandas, we use the var() method. In statistics, variance is simply the sum of squares divided by the number of observations. Variance is also called the average dispersion.

Examples –

Let’s create a dataset to work with.

import pandas as pd

data = {'Apple':[89, 89, 90, 110, 125, 84, 131, 123, 123, 140, 145, 145],
       'Orange': [46, 46, 50, 65, 63, 48, 110, 120, 60, 42, 47, 62],
       'Banana': [26, 30, 30, 25, 38, 22, 22, 36, 20, 27, 23, 34 ],
       'Mango': [80, 80, 90, 125, 130, 150, 140, 140, 135, 135, 80, 90]}

index = ['Jan','Feb','Mar','Apr','May','June','Jul','Aug','Sep','Oct','Nov','Dec']
df = pd.DataFrame(data, index=index)
df

1 . Calculate the variance of a column –

You can calculate the variance of a single column like this

df['Apple'].var()
#output
532.3333333333335

or you can calculate the variance of all the columns like this

df.var()
#output
Apple     532.333333
Orange    649.113636
Banana     34.750000
Mango     774.810606
dtype: float64

2 . Calculate the Variance of the rows –

To calculate the variance of the rows, we need to set the axis parameter to axis=1 or columns.

df.var(axis=1)
Jan      864.250000
Feb      776.916667
Mar      900.000000
Apr     2056.250000
May     2084.666667
June    3080.000000
Jul     2914.250000
Aug     2178.250000
Sep     2931.000000
Oct     3578.000000
Nov     2802.250000
Dec     2244.916667
dtype: float64

3 . Degrees of Freedom –

By default the Variance is normalized by N-1. But you can change it to N by setting the ddof=0.

df.var(ddof=0)
#output
Apple     487.972222
Orange    595.020833
Banana     31.854167
Mango     710.243056
dtype: float64

Rating: 1 out of 5.

Leave a Reply