### Introduction

Variance is a statistical concept that quantifies the amount of dispersion in a set of data points. In other words, it measures how far each number in the dataset is from the mean and thus from every other number in the set. Variance is often denoted by the symbols σ² (for population variance) and s² (for sample variance).

There are two types of variance: population variance and sample variance. Population variance refers to the variance within an entire population. Sample variance, on the other hand, refers to the variance within a sample of the population.

In this article, we will learn how to calculate both sample and population variance in Python. We will use Python’s built-in functions, as well as the powerful libraries `numpy`

, `pandas`

, and `statistics`

, for our computations. We will also discuss the concept of Bessel’s correction and its importance in the calculation of sample variance.

### Calculating Variance

Before diving into Python, let’s quickly discuss how variance is calculated.

The formula for population variance is:

`σ² = Σ ( xi - μ )² / N`

And for sample variance, it’s:

`s² = Σ ( xi - x̄ )² / (n - 1)`

Where:

- xi represents each value from the dataset,
- μ is the population mean,
- x̄ is the sample mean,
- N is the size of the population,
- n is the size of the sample.

The key difference between the two formulas is the denominator. For population variance, we divide by the size of the population (N), whereas for sample variance, we divide by the size of the sample minus one (n – 1). This adjustment is known as Bessel’s correction, which corrects the bias in the estimation of the population variance.

## Calculating Variance in Python

Python, along with its libraries, provides several ways to calculate variance.

### Using Built-in Python Functions

We can compute variance using plain Python code and built-in functions. Below is an example of how to do this.

```
# Population variance
def population_variance(data):
# Number of observations
n = len(data)
# Mean of the data
mean = sum(data) / n
# Square deviations
deviations = [(x - mean) ** 2 for x in data]
# Variance
variance = sum(deviations) / n
return variance
# Sample variance
def sample_variance(data):
# Number of observations
n = len(data)
# Mean of the data
mean = sum(data) / n
# Square deviations
deviations = [(x - mean) ** 2 for x in data]
# Variance
variance = sum(deviations) / (n - 1)
return variance
data = [2, 4, 6, 8, 10]
print("Population Variance:", population_variance(data))
print("Sample Variance:", sample_variance(data))
```

### Using the `statistics`

Library

Python’s `statistics`

library, introduced in Python 3.4, provides functions to calculate mathematical statistics of numeric data. The functions `pvariance`

and `variance`

can be used to calculate population variance and sample variance, respectively.

```
import statistics as stats
data = [2, 4, 6, 8, 10]
print("Population Variance:", stats.pvariance(data))
print("Sample Variance:", stats.variance(data))
```

### Using `numpy`

`numpy`

is a fundamental package for scientific computing in Python. It provides a high-performance multidimensional array object and tools for working with arrays. The `var`

function can be used to compute variance. By default, this function calculates the population variance. To calculate sample variance, we need to set the `ddof`

(Delta Degrees of Freedom) parameter to 1.

```
import numpy as np
data = np.array([2, 4, 6, 8, 10])
print("Population Variance:", np.var(data))
print("Sample Variance:", np.var(data, ddof=1))
```

### Using `pandas`

`pandas`

is a powerful data manipulation library in Python. It provides data structures and functions needed to manipulate structured data. The `var`

function of a `pandas`

Series computes variance. Note that this function calculates the sample variance by default. To compute the population variance, set `ddof`

to 0.

```
import pandas as pd
data = pd.Series([2, 4, 6, 8, 10])
print("Population Variance:", data.var(ddof=0))
print("Sample Variance:", data.var())
```

### Conclusion

In this article, we have learned how to calculate both sample and population variance in Python using built-in functions as well as the `numpy`

, `pandas`

, and `statistics`

libraries. Understanding variance is crucial for data analysis and machine learning tasks, as it provides insights into the data’s dispersion. The different libraries in Python provide flexible and efficient ways to calculate variance, making Python a powerful tool for statistical computing.