How to Perform a Repeated Measures ANOVA in Python

Spread the love

Introduction

Repeated Measures ANOVA (Analysis of Variance) is a statistical technique used to analyze the changes in means when the same subjects are measured under different conditions or over time. This article takes a hands-on approach to perform Repeated Measures ANOVA in Python by using an example dataset.

Example Dataset

For this tutorial, let’s consider an example where a researcher is studying the effect of three different diets on weight loss. Ten participants are subjected to each of the three diets consecutively for a month, and their weight is recorded at the end of each month.

Step 1: Import Libraries

First, import the necessary libraries:

import pandas as pd
import numpy as np
import statsmodels.api as sm
from statsmodels.stats.anova import AnovaRM
from scipy import stats

Step 2: Create the Dataset

Let’s create a DataFrame to represent our dataset. The dataset consists of the weights of 10 participants after trying each of the three diets.

data = pd.DataFrame({
    'subject_id': np.arange(1, 11).tolist() * 3,
    'diet': ['Diet1']*10 + ['Diet2']*10 + ['Diet3']*10,
    'weight': [
        220, 235, 210, 228, 240,
        229, 217, 234, 243, 221,
        215, 231, 209, 220, 235,
        226, 214, 225, 237, 218,
        212, 228, 207, 218, 232,
        224, 210, 221, 235, 216
    ]
})

Step 3: Data Exploration

Let’s take a look at the dataset.

print(data.head())
print(data.describe())

Step 4: Checking Assumptions

We need to check if the data meets the assumptions of normality. In a real-world scenario, you should also check for sphericity; however, Python does not have a built-in function for Mauchly’s test of sphericity. You could use R for that.

# Checking normality
_, p_normality = stats.shapiro(data['weight'])
print(f'p-value for normality: {p_normality}')

Step 5: Performing Repeated Measures ANOVA

Now, let’s perform the Repeated Measures ANOVA using the AnovaRM class.

anova = AnovaRM(data=data, depvar='weight', subject='subject_id', within=['diet'])
fit = anova.fit()
print(fit.summary())

Step 6: Interpreting the Results

Focus on the F-value and the p-value. If the p-value is less than 0.05, it suggests that there are significant differences in weights between the diets.

Step 7: Post Hoc Testing

If you find a significant effect, you might want to perform post hoc tests to determine which specific pairs of means are different.

from statsmodels.stats.multicomp import MultiComparison

multi_comp = MultiComparison(data['weight'], data['diet'])
post_hoc_res = multi_comp.tukeyhsd()
print(post_hoc_res.summary())

Step 8: Reporting the Results

If the p-value is significant, you might report it as follows:

“A Repeated Measures ANOVA was conducted to compare the effect of three different diets on weight loss in ten participants. There was a significant effect of diet on weight loss at the p < .05 level for the three conditions [F(2, 18) = x.xx, p = 0.0x].”

Replace x.xx and 0.0x with the actual values from your output.

Conclusion

In this article, we have walked through the process of performing a Repeated Measures ANOVA in Python using an example dataset. This is a powerful tool for analyzing data where the same subjects are measured under different conditions. While we focused on a hands-on example, it’s also essential to have a conceptual understanding of the test and its assumptions.

Leave a Reply