
Introduction
McNemar’s Test is a non-parametric statistical method used to determine whether the row and column marginal frequencies in a 2×2 contingency table are equal. It is primarily used for paired nominal data. In this article, we will delve into the intricacies of McNemar’s Test, understand its significance, and learn how to perform it using Python.
Background Knowledge
a. Contingency Table
A contingency table, also known as a cross-tabulation table, shows the frequency distribution of the variables. In the context of McNemar’s Test, we are dealing with 2×2 contingency tables, which have two rows and two columns.
b. Paired Nominal Data
Paired nominal data consists of pairs of categorical data. For instance, it might represent the presence or absence of a characteristic in the same subject at two different times.
c. Significance of McNemar’s Test
McNemar’s Test is used when you have paired nominal data and you want to see whether the proportions of one nominal variable are different from the proportions of another nominal variable.
Understanding McNemar’s Test
a. Hypotheses
The hypotheses for McNemar’s test are:
- Null Hypothesis (H0): The proportions of the two nominal variables are equal.
- Alternative Hypothesis (H1): The proportions of the two nominal variables are not equal.
b. Test Statistic
McNemar’s Test statistic is calculated based on the differences in the off-diagonal elements of the 2×2 contingency table.
c. Assumptions
- The data are paired and nominal.
- The sample size should not be too small.
d. Applications
- Comparing the effectiveness of two treatments.
- Assessing changes in characteristics in a population over time.
Loading and Preparing Data
Before you can perform McNemar’s Test, you need to have some data. Load your data from a CSV file, excel, SQL database, or any other source. The pandas library is useful for loading and managing data.
Example:
import pandas as pd
# Load data from a CSV file
data = pd.read_csv('your-data-file.csv')
Performing McNemar’s Test in Python
a. Using statsmodels
The statsmodels
library provides the mcnemar
function for performing McNemar’s Test.
import statsmodels.stats.contingency_tables as ct
# Contingency table
# [[a, b],
# [c, d]]
table = [[10, 20], [30, 40]]
# Perform McNemar's Test
result = ct.mcnemar(table, exact=True)
# Output the results
print(f"statistic = {result.statistic}")
print(f"p-value = {result.pvalue}")
b. Interpreting the Results
The result includes a test statistic and a p-value. If the p-value is below a threshold, usually 0.05, you reject the null hypothesis in favor of the alternative – indicating that the proportions are different.
Practical Example
Let’s consider a practical example where you have data on the presence of a disease before and after treatment for several patients, and you want to know if the treatment has a significant effect on the disease.
import statsmodels.stats.contingency_tables as ct
# Sample data: Disease presence before and after treatment
# [['Disease Before Treatment', 'Disease After Treatment'],
# [Present, Absent],
# [Absent, Present]]
data = [[35, 15], [20, 30]]
# Perform McNemar's Test
result = ct.mcnemar(data, exact=True)
# Output the results
print(f"statistic = {result.statistic}")
print(f"p-value = {result.pvalue}")
# Interpret the results
alpha = 0.05
if result.pvalue < alpha:
print("Reject the null hypothesis - There is a significant change in disease presence after treatment")
else:
print("Fail to reject the null hypothesis - There is no significant change in disease presence after treatment")
Conclusion
McNemar’s Test is an essential tool for analyzing paired nominal data. It is particularly useful in medical research for evaluating the changes before and after treatments. Python, with the statsmodels
library, provides a powerful and efficient way to perform McNemar’s Test. When applying this test, it is crucial to understand its assumptions and interpret the results within the context of your data.