Skewness and kurtosis are two crucial statistical concepts that help in understanding the type and shape of the distribution of data. Both skewness and kurtosis are considered essential for statistical data analysis, inferential statistics, and predictive modeling.
- Skewness measures the asymmetry of a probability distribution about its mean. Positive skewness indicates that the tail on the right side of the distribution is longer or fatter than the left side. Negative skewness indicates the opposite.
- Kurtosis measures the “tailedness” of the probability distribution. It represents the height and sharpness of the central peak relative to that of a standard bell curve (Gaussian distribution). High kurtosis (>3 for a normalised distribution) indicates a high peak and fat tails, whereas low kurtosis (<3 for a normalised distribution) suggests a flat peak and thin tails.
In this article, we will look at various ways to calculate skewness and kurtosis in Python using different libraries, such as
The formula for calculating skewness is:
skewness = E[(X - μ)³] / σ³
- E is the expectation operator.
- X is a random variable.
- μ is the mean.
- σ is the standard deviation.
The formula for calculating kurtosis is:
kurtosis = E[(X - μ)⁴] / σ⁴
The same variables apply as in the skewness formula. The resulting kurtosis is often compared with 3, which is the kurtosis of the normal distribution. Therefore, the formula for excess kurtosis (where the kurtosis of the normal distribution is subtracted) is often used:
excess kurtosis = E[(X - μ)⁴] / σ⁴ - 3
Calculating Skewness and Kurtosis in Python
scipy library provides the
kurtosis functions through its
scipy.stats module to compute skewness and kurtosis, respectively.
from scipy.stats import skew, kurtosis # Sample data data = [2, 8, 0, 4, 1, 9, 9, 0] print("Skewness:", skew(data)) print("Kurtosis:", kurtosis(data))
kurtosis function in
scipy calculates the Fisher’s kurtosis, which is the excess kurtosis and is 0 for a normal distribution. If you want the regular kurtosis (where it’s 3 for a normal distribution), add 3 to the result.
pandas is a data manipulation and analysis library in Python. It provides the
kurt functions to compute skewness and kurtosis on pandas Series and DataFrames.
import pandas as pd # Sample data data = pd.Series([2, 8, 0, 4, 1, 9, 9, 0]) print("Skewness:", data.skew()) print("Kurtosis:", data.kurt())
Using NumPy and SciPy
numpy does not provide direct functions to compute skewness and kurtosis, we can use it in combination with
scipy to compute these values. Here,
numpy would be used to compute the necessary statistical properties like mean and standard deviation.
import numpy as np from scipy.stats import moment # Sample data data = np.array([2, 8, 0, 4, 1, 9, 9, 0]) mean = np.mean(data) std_dev = np.std(data) # Calculate skewness and kurtosis using the formulas skewness = moment(data, moment=3) / std_dev**3 kurtosis = moment(data, moment=4) / std_dev**4 - 3 print("Skewness:", skewness) print("Kurtosis:", kurtosis)
moment(data, moment=3) and
moment(data, moment=4) compute the third and fourth moments of the data, respectively.
In this article, we have discussed how to calculate skewness and kurtosis in Python using several libraries. Understanding the skewness and kurtosis of your data is essential for proper data analysis and statistical modeling. Python’s extensive library support, including
numpy, make these calculations straightforward and efficient.