How to Calculate Skewness & Kurtosis in Python

Spread the love

Introduction

Skewness and kurtosis are two crucial statistical concepts that help in understanding the type and shape of the distribution of data. Both skewness and kurtosis are considered essential for statistical data analysis, inferential statistics, and predictive modeling.

  • Skewness measures the asymmetry of a probability distribution about its mean. Positive skewness indicates that the tail on the right side of the distribution is longer or fatter than the left side. Negative skewness indicates the opposite.
  • Kurtosis measures the “tailedness” of the probability distribution. It represents the height and sharpness of the central peak relative to that of a standard bell curve (Gaussian distribution). High kurtosis (>3 for a normalised distribution) indicates a high peak and fat tails, whereas low kurtosis (<3 for a normalised distribution) suggests a flat peak and thin tails.

In this article, we will look at various ways to calculate skewness and kurtosis in Python using different libraries, such as scipy, pandas, and numpy.

Skewness Formula

The formula for calculating skewness is:

skewness = E[(X - μ)³] / σ³

Where:

  • E is the expectation operator.
  • X is a random variable.
  • μ is the mean.
  • σ is the standard deviation.

Kurtosis Formula

The formula for calculating kurtosis is:

kurtosis = E[(X - μ)⁴] / σ⁴

The same variables apply as in the skewness formula. The resulting kurtosis is often compared with 3, which is the kurtosis of the normal distribution. Therefore, the formula for excess kurtosis (where the kurtosis of the normal distribution is subtracted) is often used:

excess kurtosis = E[(X - μ)⁴] / σ⁴ - 3

Calculating Skewness and Kurtosis in Python

Using SciPy

The scipy library provides the skew and kurtosis functions through its scipy.stats module to compute skewness and kurtosis, respectively.

from scipy.stats import skew, kurtosis

# Sample data
data = [2, 8, 0, 4, 1, 9, 9, 0]

print("Skewness:", skew(data))
print("Kurtosis:", kurtosis(data))

The kurtosis function in scipy calculates the Fisher’s kurtosis, which is the excess kurtosis and is 0 for a normal distribution. If you want the regular kurtosis (where it’s 3 for a normal distribution), add 3 to the result.

Using Pandas

pandas is a data manipulation and analysis library in Python. It provides the skew and kurt functions to compute skewness and kurtosis on pandas Series and DataFrames.

import pandas as pd

# Sample data
data = pd.Series([2, 8, 0, 4, 1, 9, 9, 0])

print("Skewness:", data.skew())
print("Kurtosis:", data.kurt())

Using NumPy and SciPy

While numpy does not provide direct functions to compute skewness and kurtosis, we can use it in combination with scipy to compute these values. Here, numpy would be used to compute the necessary statistical properties like mean and standard deviation.

import numpy as np
from scipy.stats import moment

# Sample data
data = np.array([2, 8, 0, 4, 1, 9, 9, 0])

mean = np.mean(data)
std_dev = np.std(data)

# Calculate skewness and kurtosis using the formulas
skewness = moment(data, moment=3) / std_dev**3
kurtosis = moment(data, moment=4) / std_dev**4 - 3

print("Skewness:", skewness)
print("Kurtosis:", kurtosis)

Here, moment(data, moment=3) and moment(data, moment=4) compute the third and fourth moments of the data, respectively.

Conclusion

In this article, we have discussed how to calculate skewness and kurtosis in Python using several libraries. Understanding the skewness and kurtosis of your data is essential for proper data analysis and statistical modeling. Python’s extensive library support, including scipy, pandas, and numpy, make these calculations straightforward and efficient.

Leave a Reply