How to Perform a Ljung-Box Test in Python

Spread the love

The Ljung-Box test is a statistical test primarily used in time series analysis to check if a series of data points are randomly distributed, specifically by checking for the absence of autocorrelation. Autocorrelation refers to the correlation of a time series with a lagged version of itself. If autocorrelation exists in a time series, it may suggest that past values of the series are useful in predicting future values.

In Python, the statsmodels library provides a function to perform the Ljung-Box test. In this guide, we’ll discuss how to carry out the Ljung-Box test on a time series dataset.

Importing the Libraries

Import the required libraries:

import numpy as np
import pandas as pd
from statsmodels.stats.diagnostic import acorr_ljungbox
import matplotlib.pyplot as plt

Generating a Sample Dataset

For this guide, we’ll generate a dataset from a normal distribution, and accumulate these values over time to create a time series dataset:

# Set the seed for reproducibility
np.random.seed(0)

# Generate a dataset from a normal distribution and accumulate it over time
data = np.random.normal(loc=0, scale=1, size=1000).cumsum()

This generates 1000 data points from a normal distribution, and then the cumsum function is used to create a cumulative sum over these points, creating a simple random walk time series.

Ljung-Box Test

The acorr_ljungbox function from the statsmodels library is used to perform the Ljung-Box test:

ljungbox_result = acorr_ljungbox(data, lags=10)

# Print results
for lag, p_value in enumerate(ljungbox_result.iloc[:,1]):
    print(f"Lag: {lag + 1} - P-Value: {p_value}")

The acorr_ljungbox function returns an array of test statistics and an array of p-values for the test statistics for each lag tested. The lags parameter specifies the number of lags to test for autocorrelation, so in this case, the test will be conducted for 10 lags. The p-value is then printed for each lag.

The null hypothesis of the Ljung-Box test is that the data are independently distributed. Therefore, if the p-value is less than the chosen alpha level (typically 0.05), then the null hypothesis is rejected and there is evidence that the data are not independently distributed, i.e., they exhibit autocorrelation.

Plotting the Results

To better visualize the results, you can plot the p-values for each lag:

plt.plot(range(1, 11), ljungbox_result.iloc[:,1], marker='o')
plt.axhline(0.05, color='r', linestyle='--')
plt.title('Ljung-Box Test P-Values')
plt.xlabel('Lag')
plt.ylabel('P-Value')
plt.show()

In the plot, the horizontal red dashed line represents the 0.05 significance level. Any points below this line indicate lags where the p-value is less than 0.05, suggesting significant autocorrelation.

Conclusion

In this article, we’ve discussed how to perform a Ljung-Box test in Python using the statsmodels library. The Ljung-Box test is a powerful tool in time series analysis, helping to check for the presence of autocorrelation in the data. However, it’s important to keep in mind that the Ljung-Box test checks for any form of autocorrelation at a given number of lags, not whether this autocorrelation structure matches any theoretical form (such as the geometric decay seen in autoregressive processes). Therefore, the Ljung-Box test should not be used alone for model diagnostics, but in combination with other diagnostic checks and plots, such as the autocorrelation function (ACF) and partial autocorrelation function (PACF) plots.

Leave a Reply