How to Perform One Sample & Two Sample Z-Tests in R

Spread the love

Z-tests are statistical tests used to determine if there’s a significant difference between a sample mean and a population mean, or between the means of two different populations. The Z-test is typically used when the population standard deviation is known, or when the sample size is large.

This article will cover how to perform both one-sample and two-sample Z-tests in the statistical programming language R. We’ll start with a theoretical overview of each test and then dive into examples using R’s built-in functions and tools.

The One-Sample Z-Test

The one-sample Z-test is used to test the hypothesis that the mean of a population is equal to a specified value. Here’s how to perform it in R:

Theoretical Overview

The one-sample Z-test compares the mean of a sample with the population mean. The null hypothesis is that there’s no difference between the sample mean and the population mean, while the alternative hypothesis claims a difference.The Z-score for the one-sample Z-test is calculated using the following formula:

Where:

  • xˉxˉ is the sample mean
  • μμ is the population mean
  • σσ is the population standard deviation
  • nn is the sample size

Step-by-Step Example in R

Let’s consider a scenario where you want to test whether the mean height of a certain species of tree in a sample of 100 trees is different from 30 feet, with a known population standard deviation of 4 feet.

Step 1: Load Data

First, you need to load your sample data into R:

sample_data <- c(31, 32, 29, 30, 32, ... ) # Insert your data here

Step 2: Define Parameters

Next, define the population mean and standard deviation:

population_mean <- 30
population_sd <- 4

Step 3: Compute the Z-score

Calculate the Z-score using the formula provided earlier:

sample_mean <- mean(sample_data)
sample_size <- length(sample_data)
z_score <- (sample_mean - population_mean) / (population_sd / sqrt(sample_size))

Step 4: Find the P-value

Find the two-tailed P-value:

p_value <- 2 * (1 - pnorm(abs(z_score)))

Step 5: Interpret Results

If the P-value is less than your chosen significance level (e.g., 0.05), you reject the null hypothesis, concluding that there is a significant difference between the sample mean and the population mean.

The Two-Sample Z-Test

The two-sample Z-test compares the means of two independent samples. It helps to determine if two population means are different when the standard deviations of both populations are known.

Theoretical Overview

The Z-score for the two-sample Z-test is calculated using:

Where:

  • xˉ1,xˉ2xˉ1​,xˉ2​ are the sample means
  • μ1,μ2μ1​,μ2​ are the population means
  • σ12,σ22σ12​,σ22​ are the population variances
  • n1,n2n1​,n2​ are the sample sizes

Step-by-Step Example in R

Suppose you want to test whether the mean heights of two species of trees are different, with known population standard deviations.

Step 1: Load Data

Load your two samples into R:

sample1_data <- c(31, 32, 29, 30, 32, ... ) # Insert your data for sample 1
sample2_data <- c(28, 27, 30, 29, 28, ... ) # Insert your data for sample 2

Step 2: Define Parameters

Define the population means and standard deviations for both populations:

population_mean1 <- 30
population_mean2 <- 28
population_sd1 <- 4
population_sd2 <- 5

Step 3: Compute the Z-score

Calculate the Z-score:

sample_mean1 <- mean(sample1_data)
sample_mean2 <- mean(sample2_data)
sample_size1 <- length(sample1_data)
sample_size2 <- length(sample2_data)

z_score <- ((sample_mean1 - sample_mean2) - (population_mean1 - population_mean2)) / 
            sqrt((population_sd1^2 / sample_size1) + (population_sd2^2 / sample_size2))

Step 4: Find the P-value

Find the two-tailed P-value:

p_value <- 2 * (1 - pnorm(abs(z_score)))

Step 5: Interpret Results

If the P-value is less than your chosen significance level, reject the null hypothesis, concluding that there is a significant difference between the means of the two populations.

Conclusion

The one-sample and two-sample Z-tests are powerful statistical tests for comparing means. Performing these tests in R requires understanding the underlying theory and following a series of steps to load data, define parameters, compute Z-scores, and interpret results.

Remember, Z-tests assume normality and known population standard deviations. If these assumptions are not met, other tests like the T-test might be more appropriate.

Posted in RTagged

Leave a Reply