How to Perform White’s Test in Python

Spread the love

The White’s test is a statistical test that is used to examine whether the residuals from a regression analysis are homoscedastic (i.e., have constant variance) as is assumed in ordinary least squares regression. This test is crucial because if the errors are heteroscedastic (the variance of the errors differs across observations), then the standard errors of the regression coefficients may be incorrect, which can lead to incorrect inferences.

In this comprehensive article, we’ll provide an in-depth guide on understanding White’s test, its significance, and how to perform it in Python.

Understanding White’s Test

White’s test, named after Halbert White, is a general test for heteroscedasticity in a regression model. Unlike some other tests (like the Breusch-Pagan test), it does not require the errors to be normally distributed or the form of heteroscedasticity to be specified.

The White’s test is conducted in three steps:

  1. Estimate the regression model and obtain the squared residuals.
  2. Perform a regression of the squared residuals on a set of explanatory variables.
  3. Use the results of this second regression to test the null hypothesis of homoscedasticity.

The null and alternative hypotheses for White’s test are:

  • Null Hypothesis (H0): The error variances are all equal (Homoscedasticity).
  • Alternative Hypothesis (H1): The error variances are not equal (Heteroscedasticity).

Loading and Preparing the Data

In this tutorial, we’ll use the ‘mtcars’ dataset, a well-known dataset that is available in the statsmodels library. This dataset comprises fuel consumption and 10 aspects of automobile design and performance for 32 automobiles (1973–74 models).

import statsmodels.api as sm

# Load the dataset
data = sm.datasets.get_rdataset('mtcars').data

print(data.head())

Fitting a Regression Model

We’ll fit an Ordinary Least Squares (OLS) regression model to our data, with the miles per gallon (mpg) as the dependent variable and the displacement (disp), horsepower (hp), and weight (wt) as the independent variables.

# Define the dependent variable
y = data['mpg']

# Define the independent variables
X = data[['disp', 'hp', 'wt']]

# Add a constant to the independent variables matrix
X = sm.add_constant(X)

# Fit the OLS model
model = sm.OLS(y, X).fit()

Performing White’s Test

After fitting our regression model, we can perform the White’s test for heteroscedasticity. The statsmodels library provides the het_white function for this purpose.

Here’s how we can apply it:

from statsmodels.stats.diagnostic import het_white

# Get the residuals
residuals = model.resid

# Perform White's Test
white_test = het_white(residuals, model.model.exog)

# Unpack the results
labels = ['LM Statistic', 'LM-Test p-value', 'F-Statistic', 'F-Test p-value']
print(dict(zip(labels, white_test)))

The het_white function returns four values: the test statistic of White’s test, the p-value of the test, the F-statistic of the hypothesis test, and the p-value for the F-test. If the p-values are less than your chosen significance level (typically 0.05), you would reject the null hypothesis and conclude that the error variances are not constant (i.e., heteroscedasticity is present).

Handling Heteroscedasticity

If your data show signs of heteroscedasticity, you can use several strategies to handle it:

  1. Transforming the dependent variable: Applying a transformation like the natural logarithm can sometimes stabilize the variance.
  2. Using weighted least squares: If you know how the variance changes with your predictors, you can use this information to weight your observations when you perform the regression.
  3. Using robust standard errors: These are a computationally simple way to get valid inference when heteroscedasticity may be a problem.

Conclusion

In this article, we took a deep dive into White’s test, its importance, and its implementation in Python using the het_white() function from the statsmodels library. This test is an essential tool for diagnosing heteroscedasticity in the residuals of a regression model. Recognizing and dealing with heteroscedasticity is essential because it can invalidate the usual OLS inference. By conducting this test and accurately interpreting its results, you can ensure that your regression models are more reliable, and your conclusions are sound.

Leave a Reply