
The Goldfeld-Quandt (GQ) test is a statistical test used to check the homoscedasticity of error terms in a regression model. Homoscedasticity, or constant variance, is one of the important assumptions of linear regression models. If the error terms don’t have constant variance (a condition known as heteroscedasticity), it can lead to inefficient and biased estimators, impacting the reliability of the regression model’s predictions.
In this article, we will be discussing in detail how to implement the Goldfeld-Quandt test in Python. We will first start with understanding the basics of the test and then move on to applying it using Python libraries.
Understanding the Goldfeld-Quandt Test
The GQ test is applied when you have a reason to believe that the variances of the error terms in your regression model are not constant, and they instead increase or decrease with the values of the independent variable.
The test splits the data into two groups, excluding a certain fraction of observations in the middle. It then performs separate regressions on these two groups and compares the ratio of the residual sum of squares (RSS). If the ratio is significantly different from one, it is evidence that the variances of the error terms are not constant, and you may reject the null hypothesis of homoscedasticity.
The null and alternative hypotheses for the GQ test are as follows:
- Null Hypothesis (H0): The error terms have constant variance (homoscedasticity).
- Alternative Hypothesis (H1): The error terms don’t have constant variance (heteroscedasticity).
Loading and Preparing the Data
Let’s first load the Boston Housing dataset and prepare it for analysis.
from sklearn.datasets import load_boston
import pandas as pd
# Load the dataset
boston = load_boston()
# Create a DataFrame
df = pd.DataFrame(boston.data, columns=boston.feature_names)
df['MEDV'] = boston.target # Target variable
print(df.head())
Fitting a Regression Model
Before we can perform the GQ test, we need to fit a regression model. We’ll use the Ordinary Least Squares (OLS) model from the statsmodels library.
import statsmodels.api as sm
# Define the dependent variable
y = df['MEDV']
# Define the independent variables
X = df.drop('MEDV', axis=1)
# Add a constant to the independent variables matrix
X = sm.add_constant(X)
# Fit the OLS model
model = sm.OLS(y, X).fit()
Performing the Goldfeld-Quandt Test
Now that we have our regression model, we can perform the GQ test using the het_goldfeldquandt
function from the statsmodels library.
from statsmodels.stats.diagnostic import het_goldfeldquandt
# Perform the Goldfeld-Quandt test
gq_test = het_goldfeldquandt(y, X)
print(f"F statistic: {gq_test[0]}")
print(f"p-value: {gq_test[1]}")
The het_goldfeldquandt
function returns the F statistic and the associated p-value. If the p-value is less than your chosen significance level (e.g., 0.05), you reject the null hypothesis and conclude that there is evidence of heteroscedasticity.
Visualizing Heteroscedasticity
Another way to check for heteroscedasticity is by visualizing the residuals. If the variance of the residuals changes with the values of the independent variable, this can be a sign of heteroscedasticity.
Let’s create a residuals plot:
import matplotlib.pyplot as plt
import numpy as np
# Calculate the residuals
df['PREDICTED'] = model.predict(X)
df['RESIDUALS'] = model.resid - df['PREDICTED']
# Create a scatter plot
plt.scatter(df['PREDICTED'], df['RESIDUALS'])
plt.xlabel('Predicted')
plt.ylabel('Residuals')
plt.axhline(y=0, color='r', linestyle='-')
plt.title('Residuals vs. Predicted')
plt.show()

If you observe a funnel-like shape in the plot (residuals expanding as the predicted value increases), this is a sign of heteroscedasticity.
Dealing with Heteroscedasticity
If you find evidence of heteroscedasticity, there are a few strategies you can apply:
- Transforming the dependent variable: Logarithmic or square root transformations can sometimes help stabilize the variance.
- Using a weighted regression model: This gives less weight to observations with higher variance.
- Changing the model specification: Sometimes a nonlinear relationship or interaction effects can solve the problem.
- Using robust standard errors: They are designed to be valid even when heteroscedasticity is present.
Conclusion
In this article, we’ve explained what the Goldfeld-Quandt test is, how it works, and how to implement it in Python. We’ve also shown how to visualize heteroscedasticity in your data and some strategies to deal with it.
Understanding and checking the assumptions of your regression model is a crucial part of any data analysis. It’s important to remember, though, that real-world data often violate these assumptions to some degree, and the key is understanding when these violations are severe enough to impact your results. Always complement statistical tests with visual checks and, when possible, repeat your analysis using different models or transformations to see if your results hold.