The One Proportion Z-Test is a useful statistical tool for testing hypotheses about population proportions. This comprehensive guide will take you through the process using a real dataset, the Titanic dataset.
1. Understanding the One Proportion Z-Test
The One Proportion Z-Test is used to determine whether the observed proportion of successes in a sample significantly differs from a hypothesized population proportion. This test assumes normal distribution of the sampling data and is most effective when the sample size is large enough due to the Central Limit Theorem (CLT).
2. When to Use the One Proportion Z-Test
This test is ideal when you have categorical data, the data is a random sample from the population, and you wish to test a hypothesis about the population proportion.
3. Steps to Perform One Proportion Z-Test in R with Titanic Dataset
Step 1: Load the Dataset
To load the Titanic dataset, first, ensure the datasets package is installed and loaded into your R environment. Then, load the Titanic dataset as follows:
The dataset is in table format. We’ll flatten it into a data frame and select relevant columns.
titanic_data <- as.data.frame(Titanic)
In this example, we’ll be interested in the “Survived” column which represents whether a passenger survived or not.
Step 2: Convert Data to Numeric Format
The “Survived” column contains categorical data. We need to convert this data into numeric format for the test. We’ll convert ‘Yes’ and ‘No’ responses to 1 and 0, respectively.
titanic_data$Survived <- ifelse(titanic_data$Survived == "Yes", 1, 0)
Step 3: Summarize the Data
Let’s summarize our data to get the total number of survivors.
sum_data <- sum(titanic_data$Survived)
And get the total number of passengers.
n <- nrow(titanic_data)
Step 4: Define Hypothesized Proportion
We’re going to test if the proportion of survivors is significantly different from 0.5 (50%).
p0 <- 0.5
Step 5: Perform the One Proportion Z-Test
We’re going to use the
prop.test() function in R to perform the One Proportion Z-Test.
test_result <- prop.test(x = sum_data, n = n, p = p0, alternative = "two.sided", correct = FALSE)
Step 6: Interpret the Results
To view the results of the test, we print the test_result object.
- If the p-value is less than the significance level (0.05), we reject the null hypothesis. This implies that the proportion of survivors on the Titanic was significantly different from 50%.
- The confidence interval gives a range of values within which the true population proportion likely lies. If 0.5 is not within this range, it provides further evidence against the null hypothesis.
The One Proportion Z-Test is a robust tool for testing hypotheses about population proportions. R offers an effective and straightforward way to execute this test using the
prop.test() function. The use of real datasets, like the Titanic dataset, can aid in understanding how to effectively use and interpret this test. As always, ensuring that your data meets the test’s assumptions is paramount for accurate results.