How to Perform One Sample & Two Sample Z-Tests in Python

Spread the love

Hypothesis testing is a core aspect of statistical analysis and is central to many scientific studies, business models, and decision-making processes. Among different types of tests, the Z-test is a fundamental statistical test used to determine if a sample’s mean differs significantly from a population’s mean. This article will focus on the One Sample Z-test and Two Sample Z-test, and their implementation in Python, accompanied by practical examples.

Table of Contents

  1. Understanding the Z-Test
  2. Setting Up the Python Environment
  3. Conducting a One Sample Z-Test in Python: An Illustrative Example
  4. Conducting a Two Sample Z-Test in Python: An Illustrative Example
  5. Conclusion

1. Understanding the Z-Test

A Z-test is a statistical test used to determine whether two population means are different when the variances are known and the sample size is large. The Z-test has two variations:

  • One Sample Z-Test: This test is used when you are comparing the mean of a single sample of data with a known population mean.
  • Two Sample Z-Test: This test is used when you are comparing the means of two independent samples to each other.

In the context of hypothesis testing, we start with a null hypothesis (H0) and an alternative hypothesis (H1).

  • The Null Hypothesis (H0): For the Z-test, the null hypothesis proposes that there is no significant difference between the sample mean(s) and the population mean or between the means of two populations, depending on whether it’s a one-sample or two-sample Z-test.
  • The Alternative Hypothesis (H1): The alternative hypothesis posits that there is a significant difference.

The Z-test helps us determine whether to reject or accept the null hypothesis.

2. Setting Up the Python Environment

Python, with its diverse libraries for statistical computing, is the ideal language for performing statistical tests. To carry out the Z-test, you need to install the statsmodels package which provides classes and functions for the estimation of many different statistical models, as well as for conducting statistical tests.

Install the package using pip:

pip install statsmodels

After installation, import the required libraries:

import numpy as np
from statsmodels.stats import weightstats as stests

3. Conducting a One Sample Z-Test in Python: An Illustrative Example

Suppose a school teacher claims that the average score of his class (of 60 students) in a math test is 70. To test this claim, a sample of 30 students is taken, and it is found that their average score is 72. Assuming the population standard deviation is 5, we perform a one sample Z-test.

First, let’s generate the sample data:

# Randomly generating test scores
np.random.seed(0) # for reproducibility
sample_scores = np.random.normal(72, 5, 30)

The null hypothesis is that the mean score is 70, and the alternative hypothesis is that the mean score is not 70. We use the ztest() function from the statsmodels.stats.weightstats module to perform the Z-test:

# Perform one-sample Z-test
z_statistic, p_value = stests.ztest(x1=sample_scores, value=70)

print(f'Z-statistic: {z_statistic}')
print(f'P-value: {p_value}')

If the p-value is less than the significance level (let’s use α = 0.05), we reject the null hypothesis. If not, we fail to reject the null hypothesis:

alpha = 0.05
if p_value < alpha:
    print("We reject the null hypothesis.")
else:
    print("We fail to reject the null hypothesis.")

4. Conducting a Two Sample Z-Test in Python: An Illustrative Example

In a two-sample Z-test, we compare the means of two independent samples. Suppose we have scores from two different classes of students, and we want to test if there’s a significant difference between their mean scores.

Let’s generate random test scores for two classes:

# Randomly generating test scores
np.random.seed(0) # for reproducibility
class1_scores = np.random.normal(70, 10, 60)
class2_scores = np.random.normal(72, 10, 60)

The null hypothesis is that the mean scores of both classes are equal, and the alternative hypothesis is that the mean scores are different. We use the ztest() function again to perform the Z-test:

# Perform two-sample Z-test
z_statistic, p_value = stests.ztest(x1=class1_scores, x2=class2_scores)

print(f'Z-statistic: {z_statistic}')
print(f'P-value: {p_value}')

Again, compare the p-value to the significance level (α = 0.05):

if p_value < alpha:
    print("We reject the null hypothesis.")
else:
    print("We fail to reject the null hypothesis.")

5. Conclusion

The Z-test is a versatile statistical tool for comparing the mean of a sample to a population mean or the means of two independent samples, especially when the sample size is large, and the population variance is known. Python, with its extensive statistical libraries, makes it easy for researchers and data scientists to perform these tests.

It’s crucial to remember that statistical tests provide a way of making probabilistic conclusions and are not definitive proof. They should be interpreted in the context of the study and in conjunction with other research methods. Furthermore, always ensure your data meets the necessary assumptions for the Z-test to ensure valid results.

Leave a Reply