
Introduction
Skewness and kurtosis are two crucial statistical concepts that help in understanding the type and shape of the distribution of data. Both skewness and kurtosis are considered essential for statistical data analysis, inferential statistics, and predictive modeling.
- Skewness measures the asymmetry of a probability distribution about its mean. Positive skewness indicates that the tail on the right side of the distribution is longer or fatter than the left side. Negative skewness indicates the opposite.
- Kurtosis measures the “tailedness” of the probability distribution. It represents the height and sharpness of the central peak relative to that of a standard bell curve (Gaussian distribution). High kurtosis (>3 for a normalised distribution) indicates a high peak and fat tails, whereas low kurtosis (<3 for a normalised distribution) suggests a flat peak and thin tails.
In this article, we will look at various ways to calculate skewness and kurtosis in Python using different libraries, such as scipy
, pandas
, and numpy
.
Skewness Formula
The formula for calculating skewness is:
skewness = E[(X - μ)³] / σ³
Where:
- E is the expectation operator.
- X is a random variable.
- μ is the mean.
- σ is the standard deviation.
Kurtosis Formula
The formula for calculating kurtosis is:
kurtosis = E[(X - μ)⁴] / σ⁴
The same variables apply as in the skewness formula. The resulting kurtosis is often compared with 3, which is the kurtosis of the normal distribution. Therefore, the formula for excess kurtosis (where the kurtosis of the normal distribution is subtracted) is often used:
excess kurtosis = E[(X - μ)⁴] / σ⁴ - 3
Calculating Skewness and Kurtosis in Python
Using SciPy
The scipy
library provides the skew
and kurtosis
functions through its scipy.stats
module to compute skewness and kurtosis, respectively.
from scipy.stats import skew, kurtosis
# Sample data
data = [2, 8, 0, 4, 1, 9, 9, 0]
print("Skewness:", skew(data))
print("Kurtosis:", kurtosis(data))
The kurtosis
function in scipy
calculates the Fisher’s kurtosis, which is the excess kurtosis and is 0 for a normal distribution. If you want the regular kurtosis (where it’s 3 for a normal distribution), add 3 to the result.
Using Pandas
pandas
is a data manipulation and analysis library in Python. It provides the skew
and kurt
functions to compute skewness and kurtosis on pandas Series and DataFrames.
import pandas as pd
# Sample data
data = pd.Series([2, 8, 0, 4, 1, 9, 9, 0])
print("Skewness:", data.skew())
print("Kurtosis:", data.kurt())
Using NumPy and SciPy
While numpy
does not provide direct functions to compute skewness and kurtosis, we can use it in combination with scipy
to compute these values. Here, numpy
would be used to compute the necessary statistical properties like mean and standard deviation.
import numpy as np
from scipy.stats import moment
# Sample data
data = np.array([2, 8, 0, 4, 1, 9, 9, 0])
mean = np.mean(data)
std_dev = np.std(data)
# Calculate skewness and kurtosis using the formulas
skewness = moment(data, moment=3) / std_dev**3
kurtosis = moment(data, moment=4) / std_dev**4 - 3
print("Skewness:", skewness)
print("Kurtosis:", kurtosis)
Here, moment(data, moment=3)
and moment(data, moment=4)
compute the third and fourth moments of the data, respectively.
Conclusion
In this article, we have discussed how to calculate skewness and kurtosis in Python using several libraries. Understanding the skewness and kurtosis of your data is essential for proper data analysis and statistical modeling. Python’s extensive library support, including scipy
, pandas
, and numpy
, make these calculations straightforward and efficient.