
Introduction
Repeated Measures ANOVA (Analysis of Variance) is a statistical technique used to analyze the changes in means when the same subjects are measured under different conditions or over time. This article takes a hands-on approach to perform Repeated Measures ANOVA in Python by using an example dataset.
Example Dataset
For this tutorial, let’s consider an example where a researcher is studying the effect of three different diets on weight loss. Ten participants are subjected to each of the three diets consecutively for a month, and their weight is recorded at the end of each month.
Step 1: Import Libraries
First, import the necessary libraries:
import pandas as pd
import numpy as np
import statsmodels.api as sm
from statsmodels.stats.anova import AnovaRM
from scipy import stats
Step 2: Create the Dataset
Let’s create a DataFrame to represent our dataset. The dataset consists of the weights of 10 participants after trying each of the three diets.
data = pd.DataFrame({
'subject_id': np.arange(1, 11).tolist() * 3,
'diet': ['Diet1']*10 + ['Diet2']*10 + ['Diet3']*10,
'weight': [
220, 235, 210, 228, 240,
229, 217, 234, 243, 221,
215, 231, 209, 220, 235,
226, 214, 225, 237, 218,
212, 228, 207, 218, 232,
224, 210, 221, 235, 216
]
})
Step 3: Data Exploration
Let’s take a look at the dataset.
print(data.head())
print(data.describe())
Step 4: Checking Assumptions
We need to check if the data meets the assumptions of normality. In a real-world scenario, you should also check for sphericity; however, Python does not have a built-in function for Mauchly’s test of sphericity. You could use R for that.
# Checking normality
_, p_normality = stats.shapiro(data['weight'])
print(f'p-value for normality: {p_normality}')
Step 5: Performing Repeated Measures ANOVA
Now, let’s perform the Repeated Measures ANOVA using the AnovaRM
class.
anova = AnovaRM(data=data, depvar='weight', subject='subject_id', within=['diet'])
fit = anova.fit()
print(fit.summary())
Step 6: Interpreting the Results
Focus on the F-value and the p-value. If the p-value is less than 0.05, it suggests that there are significant differences in weights between the diets.
Step 7: Post Hoc Testing
If you find a significant effect, you might want to perform post hoc tests to determine which specific pairs of means are different.
from statsmodels.stats.multicomp import MultiComparison
multi_comp = MultiComparison(data['weight'], data['diet'])
post_hoc_res = multi_comp.tukeyhsd()
print(post_hoc_res.summary())
Step 8: Reporting the Results
If the p-value is significant, you might report it as follows:
“A Repeated Measures ANOVA was conducted to compare the effect of three different diets on weight loss in ten participants. There was a significant effect of diet on weight loss at the p < .05 level for the three conditions [F(2, 18) = x.xx, p = 0.0x].”
Replace x.xx and 0.0x with the actual values from your output.
Conclusion
In this article, we have walked through the process of performing a Repeated Measures ANOVA in Python using an example dataset. This is a powerful tool for analyzing data where the same subjects are measured under different conditions. While we focused on a hands-on example, it’s also essential to have a conceptual understanding of the test and its assumptions.