The standard deviation is a measure of the amount of variance or dispersion in a set of values. A low standard deviation indicates that the values tend to be close to the mean (average) of the set, while a high standard deviation indicates that the values are spread out over a broader range.
In statistics, two types of standard deviations are commonly used – population standard deviation and sample standard deviation. The population standard deviation is used when an entire population is available, and the sample standard deviation is used when only a sample is available.
This article will guide you on how to calculate the standard deviation in Python. We will explore different Python libraries, namely
scipy, which provide functionalities to efficiently calculate the standard deviation.
Standard Deviation Formula
The formula for calculating the population standard deviation is:
σ = sqrt[ Σ ( xi - μ )² / N ]
And for the sample standard deviation:
s = sqrt[ Σ ( xi - x̄ )² / (n - 1) ]
- xi represents each value in the dataset,
- μ is the population mean,
- x̄ is the sample mean,
- N is the size of the population,
- n is the size of the sample,
- Σ is the sum of the values.
The square root is used to bring the units of variance, which are squared, back to the original units of measurement.
Calculating Standard Deviation in Python
Using Built-in Python Functions
Standard deviation can be calculated using pure Python by following the standard deviation formula:
import math # Sample data data = [4, 2, 5, 8, 6] # Calculate mean mean = sum(data) / len(data) # Calculate variance (average of squared differences from the mean) variance = sum((xi - mean) ** 2 for xi in data) / len(data) # Calculate standard deviation (square root of variance) std_dev = math.sqrt(variance) print("Standard Deviation:", std_dev)
This method works, but it can be somewhat lengthy, especially for large datasets.
statistics library, which was introduced in Python 3.4, provides functions to calculate mathematical statistics of numeric data. It offers the
pstdev function to calculate the population standard deviation, and the
stdev function to calculate the sample standard deviation.
import statistics as stats # Sample data data = [4, 2, 5, 8, 6] print("Population Standard Deviation:", stats.pstdev(data)) print("Sample Standard Deviation:", stats.stdev(data))
numpy is a powerful library in Python for mathematical and scientific computing. It provides the
std function to calculate the standard deviation. By default,
std calculates the population standard deviation. For the sample standard deviation, we need to set the
ddof (Delta Degrees of Freedom) parameter to 1.
import numpy as np # Sample data data = np.array([4, 2, 5, 8, 6]) print("Population Standard Deviation:", np.std(data)) print("Sample Standard Deviation:", np.std(data, ddof=1))
pandas is a data manipulation and analysis library in Python. It provides data structures and functions needed to manipulate structured data. The
std function of a pandas Series or DataFrame computes the standard deviation. By default, this function computes the sample standard deviation. To compute the population standard deviation, we need to set
ddof to 0.
import pandas as pd # Sample data data = pd.Series([4, 2, 5, 8, 6]) print("Population Standard Deviation:", data.std(ddof=0)) print("Sample Standard Deviation:", data.std())
In this tutorial, we have learned how to calculate the standard deviation in Python using several different methods and libraries. The standard deviation is a key statistical measure that shows the amount of variation in a dataset. Knowing how to calculate the standard deviation is a critical skill for anyone working in data analysis or statistics. Python provides several ways to calculate standard deviation efficiently, making it an excellent tool for such tasks.