
Introduction
Fisher’s Exact Test is a statistical significance test that is used to analyze the association between two categorical variables in a 2×2 contingency table. It is particularly useful when the sample sizes are small and the data does not meet the requirements needed for the chi-squared test. In this article, we will go through the steps to perform Fisher’s Exact Test using Python.
Background and Significance of Fisher’s Exact Test
Fisher’s Exact Test is used in the analysis of contingency tables. While the chi-squared test is used for large sample sizes, Fisher’s Exact Test is preferable for small sample sizes where the chi-squared test is inappropriate. The test is called ‘exact’ because its significance level does not rely on an approximation. It is often used in the analysis of categorical data where the variables are dichotomous and the sample size is small.
Understanding Fisher’s Exact Test
a. Hypotheses
The null and alternative hypotheses for Fisher’s Exact Test are as follows:
- Null Hypothesis (H0): There is no association between the two categorical variables.
- Alternative Hypothesis (H1): There is an association between the two categorical variables.
b. Assumptions
- The data is categorical.
- The sampling method is simple random sampling.
- The data is displayed in a 2×2 contingency table.
- The sample size is small.
c. Applications
- Analyzing medical clinical trial data where the sample size is small.
- Investigating associations between two binary classifications.
Loading and Preparing Data
Before you can perform Fisher’s Exact Test, you need to have some data. Load your data from a CSV file, excel, SQL database, or any other source. The pandas library is useful for loading and managing data.
Example:
import pandas as pd
# Load data from a CSV file
data = pd.read_csv('your-data-file.csv')
Performing Fisher’s Exact Test in Python
a. Using scipy.stats
The scipy
library provides the fisher_exact
function for performing Fisher’s Exact Test.
from scipy.stats import fisher_exact
# Contingency table
# [[a, b],
# [c, d]]
table = [[8, 2], [1, 5]]
# Perform Fisher's Exact Test
odds_ratio, p_value = fisher_exact(table)
# Output the results
print(f"Odds Ratio: {odds_ratio}")
print(f"P-value: {p_value}")
b. Interpreting the Results
The p-value tells you whether or not the differences between the proportions are statistically significant. If the p-value is below a threshold, usually 0.05, you can reject the null hypothesis and conclude that there is a significant association between the two categorical variables.
Practical Example
Let’s consider a practical example where you have data on the success and failure rates of two different treatments for a medical condition.
from scipy.stats import fisher_exact
# Sample data: success and failure of treatments
# [[Treatment1_success, Treatment1_failure],
# [Treatment2_success, Treatment2_failure]]
data = [[10, 6], [2, 12]]
# Perform Fisher's Exact Test
odds_ratio, p_value = fisher_exact(data)
# Output the results
print(f"Odds Ratio: {odds_ratio}")
print(f"P-value: {p_value}")
# Interpret the results
alpha = 0.05
if p_value < alpha:
print("Reject the null hypothesis - There is a significant association between the treatment types and success rates.")
else:
print("Fail to reject the null hypothesis - There is no significant association between the treatment types and success rates.")
Conclusion
Fisher’s Exact Test is an essential statistical test for analyzing small datasets and determining the significance of associations between two categorical variables in 2×2 contingency tables. Python, with its scipy
library, provides an efficient and user-friendly way to perform Fisher’s Exact Test. This test is particularly useful in fields such as medical research, where researchers often work with small sample sizes. When interpreting the results, it is crucial to consider the context of your data and the assumptions of Fisher’s Exact Test.