
The Kwiatkowski–Phillips–Schmidt–Shin (KPSS) test is a statistical tool used to test the stationarity of a time series. Stationarity is a crucial concept in time series analysis. When a time series is stationary, its statistical properties, such as mean and variance, remain constant over time. This characteristic is often a prerequisite for time series forecasting models.
This article provides a detailed guide on how to perform a KPSS test in Python.
Background
The KPSS test takes a different approach to testing stationarity compared to other tests such as the Dickey-Fuller test. The null hypothesis (H0) of the KPSS test is that the time series is stationary, while the alternative hypothesis (H1) is that it is not. This is the reverse of the Dickey-Fuller test.
Step 1: Import Libraries
Start by importing the necessary libraries:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from statsmodels.tsa.stattools import kpss
Step 2: Load and Preprocess the Data
For this tutorial, we’ll use a synthetic dataset, a sine wave with a linearly increasing trend:
# Generate synthetic data
x = np.linspace(0, 50, num=500)
y = np.sin(x) + x / 50
# Create a DataFrame
df = pd.DataFrame(y, columns=['Value'], index=pd.date_range(start='1/1/2020', periods=len(y)))
This data represents a non-stationary time series, as the underlying sine wave has an increasing trend.
Step 3: Plot the Data
Before performing the KPSS test, let’s plot our data to visually check for stationarity. A stationary time series will have constant mean and variance over time.
# Plot the data
plt.figure(figsize=(10,4))
plt.plot(df.Value)
plt.title('Synthetic Time Series')
plt.show()

From the plot, we can see that the mean is increasing over time, implying the time series is non-stationary. But visual inspection is not enough; we need a statistical test to confirm this.
Step 4: Perform the KPSS Test
We perform the KPSS test using the kpss
function provided by the statsmodels
library. This function returns the test statistic, p-value, and the critical values of the test.
# Perform KPSS test
result = kpss(df['Value'])
# Extract test statistic and p-value
kpss_stat, p_value, _, crit_values = result
print(f'KPSS Statistic: {kpss_stat}')
print(f'p-value: {p_value}')
print('Critical Values:', crit_values)
Step 5: Interpret the Results
The KPSS statistic should be compared to the critical values. If the statistic is greater than the critical value, we reject the null hypothesis (time series is stationary).
# Interpret the results
print('\nResult:')
if p_value < 0.05 :
print('The series is not stationary')
else:
print('The series is stationary')
Alternatively, we can also check the KPSS statistic against the critical value at a chosen significance level.
# Interpret the results based on critical values
print('\nResult:')
if kpss_stat > crit_values['5%']:
print('The series is not stationary')
else:
print('The series is stationary')
Conclusion
The KPSS test is a robust method to check the stationarity of a time series. It complements other tests like the Dickey-Fuller test as it has a null hypothesis that the series is stationary.
Understanding the stationarity of a time series is crucial before applying any forecasting models because most of these models require the series to be stationary. Python’s extensive libraries provide excellent support for performing such statistical tests.
While the KPSS test is a powerful tool, it also comes with its assumptions. The main assumption is that the series is trend stationary (if the ‘ct’ argument is used) or level stationary (if the ‘c’ argument is used). If these assumptions are violated, the test results might not be reliable.
Keep in mind that the KPSS test only helps to check the stationarity, but it does not help to make a non-stationary series stationary. To make a non-stationary series stationary, techniques like differencing, trend removal, or transformations like taking the log or square root can be used.
By conducting the KPSS test and interpreting its results, you can make an informed decision about the stationarity of your time series data and proceed accordingly with your analysis.