How to Perform Grubbs’ Test in Python

Spread the love

Grubbs’ Test, also known as the maximum normed residual test or extreme studentized deviate test, is a statistical test used to detect outliers in a univariate data set assumed to come from a normally distributed population. In this article, we will explain how to perform a Grubbs’ Test in Python using the outlier_utils library.

Setting Up the Environment

Before we start, make sure you have the necessary libraries installed. For this task, we’ll use the outlier_utils and scipy libraries. You can install these using pip:

pip install outlier-utils scipy

Conducting Grubbs’ Test

Let’s say we have a set of measurements, and we want to know if any of these measurements are outliers. Here’s how to conduct Grubbs’ test in Python:

from scipy.stats import zscore
from outliers import smirnov_grubbs

# Define your data
data = [8.14, 6.6, 7.2, 6.8, 7.5, 6.33, 7.7, 6.41, 6.66, 6.44, 6.71, 6.91, 7.02]

# Calculate z-scores
zscores = zscore(data)

# Conduct Grubbs' test
outliers = smirnov_grubbs.two_sided(data, alpha=0.05)

print(f"Outliers: {outliers}")

This will print any outliers in your data. If no outliers are detected, it will return an empty list.

In this code, the smirnov_grubbs.two_sided function performs the Grubbs’ test. The alpha=0.05 parameter sets the significance level of the test. If the p-value of the test is less than this significance level, the maximum value is considered an outlier.

Grubbs’ Test for Small Datasets

Grubbs’ Test is known to be overly sensitive when applied to small datasets (e.g., n < 10). The result is a higher Type I error rate (rejecting the null hypothesis when it is true) than the chosen significance level. This is particularly pronounced when searching for two outliers.

As such, for small datasets, it is recommended to be cautious in the interpretation of Grubbs’ Test results. In such situations, consider performing other outlier detection methods or tests as well.

Grubbs’ Test Assumptions

Grubbs’ Test makes two key assumptions:

  1. Normality: The data should be approximately normally distributed. If the data is not normal, then the results of Grubbs’ Test may not be valid.
  2. Independence: The observations should be independent of each other.

If these assumptions are violated, then Grubbs’ Test may not be appropriate.

Extending Grubbs’ Test

While Grubbs’ Test is traditionally used to detect a single outlier, it can be extended to detect more than one outlier by removing the detected outlier and repeating the test. However, this should be done cautiously as each test is a separate statistical test and the chance of a Type I error increases with each test.

Conclusion

In this article, we have explained how to perform a Grubbs’ Test in Python using the outlier-utils library. This test is useful for detecting outliers in a normally distributed univariate data set. As with all statistical tests, it’s important to make sure the assumptions of the test are met. The result should be interpreted in the context of the specific situation, and if in doubt, consult a statistician or data scientist. The presence of an outlier does not necessarily indicate a problem and can sometimes provide valuable insights.

Leave a Reply