How to Perform a Jarque-Bera Test in Python

Spread the love

The Jarque-Bera (JB) test is a statistical process applied to large datasets to test if the data are normally distributed. Specifically, it checks for skewness and kurtosis in the data distribution. In finance, economics, and other fields, the JB test is utilized for examining the assumption of normality, which is often a key assumption in various statistical tests and models.

This article will guide you through the process of performing a Jarque-Bera test in Python, using popular libraries like NumPy, Pandas, SciPy, and statsmodels.

Importing Necessary Libraries

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy import stats
import statsmodels.api as sm

Creating or Importing a Dataset

Before we can perform the Jarque-Bera test, we need a dataset. Here, we will generate a normally distributed dataset using the numpy.random.normal() function, and a non-normal dataset with numpy.random.exponential() function:

# Generate a normally distributed dataset
normal_data = np.random.normal(loc = 0, scale = 1, size = 1000)

# Generate a non-normally distributed (exponential) dataset
non_normal_data = np.random.exponential(scale = 1, size = 1000)

Visualizing the Dataset

It’s often helpful to visualize your dataset with a histogram to see its distribution. Here’s how you can do it with Matplotlib:

# Plot histogram for the normal dataset
plt.hist(normal_data, bins=30, color='c', edgecolor='black', alpha=0.7)
plt.title('Histogram of Normally Distributed Data')
plt.show()

# Plot histogram for the non-normal dataset
plt.hist(non_normal_data, bins=30, color='c', edgecolor='black', alpha=0.7)
plt.title('Histogram of Non-Normally Distributed Data')
plt.show()

Performing the Jarque-Bera Test

Now, we’re ready to perform the Jarque-Bera test. The scipy.stats.jarque_bera() function can be used to conduct this test.

# Perform Jarque-Bera test on the normal dataset
jb_stat, jb_p_value = stats.jarque_bera(normal_data)
print(f'Jarque-Bera statistic: {jb_stat}')
print(f'Jarque-Bera p-value: {jb_p_value}')

# Perform Jarque-Bera test on the non-normal dataset
jb_stat, jb_p_value = stats.jarque_bera(non_normal_data)
print(f'Jarque-Bera statistic: {jb_stat}')
print(f'Jarque-Bera p-value: {jb_p_value}')

The Jarque-Bera test returns two values:

  • The JB statistic: This value is always non-negative. If it’s far from zero, it signals the data do not have a normal distribution.
  • The p-value: If this value is less than the chosen alpha level (often 0.05), the null hypothesis that the data are normally distributed is rejected.

Additionally, the statsmodels library provides a jarque_bera() function, which also returns the skewness and kurtosis along with the JB statistic and p-value:

# Perform Jarque-Bera test on the normal dataset
jb_stat, jb_p_value, skew, kurtosis = sm.stats.stattools.jarque_bera(normal_data)
print(f'Jarque-Bera statistic: {jb_stat}')
print(f'Jarque-Bera p-value: {jb_p_value}')
print(f'Skewness: {skew}')
print(f'Kurtosis: {kurtosis}')

# Perform Jarque-Bera test on the non-normal dataset
jb_stat, jb_p_value, skew, kurtosis = sm.stats.stattools.jarque_bera(non_normal_data)
print(f'Jarque-Bera statistic: {jb_stat}')
print(f'Jarque-Bera p-value: {jb_p_value}')
print(f'Skewness: {skew}')
print(f'Kurtosis: {kurtosis}')

Interpreting the Results

Now, we need to interpret our results. As a rule of thumb, if the p-value is less than our significance level (often 0.05), we reject the null hypothesis of the JB test, meaning our data do not come from a normal distribution.

For the normal data, the JB statistic should be close to zero, and the p-value should be high (i.e., > 0.05), indicating we do not reject the null hypothesis that the data are normally distributed.

For the non-normal data, the JB statistic should be high, and the p-value should be low (i.e., < 0.05), indicating we reject the null hypothesis, and the data are not normally distributed.

Conclusion

This article has guided you through performing a Jarque-Bera test in Python to check the normality of a dataset. It’s crucial to understand the assumptions and limitations of any statistical test, and in the case of the Jarque-Bera test, it is best suited for large, continuous datasets. Additionally, always ensure that your dataset is appropriately pre-processed (e.g., handling missing data, outliers) before conducting the test. Remember that understanding your data and its distribution is a vital part of any data analysis process.

Leave a Reply