A paired samples t-test (also known as a dependent or related samples t-test) is a statistical procedure used to determine whether the mean difference between two sets of observations is zero. This test is applied when the observations are dependent; that is, when there is a natural pairing of observations in the data. Examples of where this might occur are a before-and-after observations on the same subjects (e.g. students’ scores on a test before and after a particular training), or a comparison of two related variables (e.g. left-hand length and right-hand length).
This article will guide you through the steps to perform a paired samples t-test in R, including data preparation, checking assumptions, running the test, and interpreting the results.
Preparing Your Dataset
To perform a paired samples t-test, your data needs to be in a format where one row corresponds to one ‘pair’ of observations, and the paired observations are represented in two separate columns. For instance, if we are comparing students’ scores before and after a training, we should have one column for the scores before the training, and one column for the scores after.
For this demonstration, we’ll use a made-up dataset of 30 students’ scores on a test before and after a particular training. We’ll create this in R as follows:
# Set seed for reproducibility set.seed(123) # Create the dataset scores <- data.frame( student_id = 1:30, before = rnorm(30, mean = 75, sd = 10), after = rnorm(30, mean = 78, sd = 10) )
In this dataset, each row represents a student, the
before column represents the test score before the training, and the
after column represents the test score after the training.
Before performing a paired samples t-test, we need to check the following assumptions:
- The dependent variable (in our case, the test scores) should be measured on a continuous scale.
- The observations are independent of each other.
- The dependent variable should follow a normal distribution.
- The differences between the two related groups should be approximately normally distributed.
The first two assumptions can usually be checked by understanding your dataset and the method of data collection.
You can check the normality assumption visually using histograms or Q-Q plots, or statistically using a Shapiro-Wilk test. Here is how you can create histograms for the two groups using the
library(ggplot2) # Create histograms for before and after scores ggplot(scores, aes(before)) + geom_histogram(binwidth = 5, fill = 'blue', color = 'black', alpha = 0.7) + theme_minimal() ggplot(scores, aes(after)) + geom_histogram(binwidth = 5, fill = 'red', color = 'black', alpha = 0.7) + theme_minimal()
You can also create a histogram or a Q-Q plot for the differences to check the last assumption:
# Calculate differences scores$difference = scores$after - scores$before # Create a histogram for the differences ggplot(scores, aes(difference)) + geom_histogram(binwidth = 5, fill = 'purple', color = 'black', alpha = 0.7) + theme_minimal() # Create a Q-Q plot for the differences qqnorm(scores$difference) qqline(scores$difference)
Performing the Paired Samples T-Test
After checking the assumptions, you can perform the paired samples t-test using the
t.test() function in R. The syntax for performing a paired samples t-test is as follows:
t.test(x, y, paired = TRUE)
x, y: Numeric vectors representing the two related groups or pairs.
paired: A logical indicating whether you want to run a paired t-test. Set this to
TRUEfor a paired samples t-test.
The following code can be used to perform a paired samples t-test on our dataset:
# Perform the paired samples t-test t.test(scores$before, scores$after, paired = TRUE)
Interpreting the Results
After running the t-test, R will provide an output with the t-value, degrees of freedom, p-value, confidence interval, and the mean of the differences. Here’s an example of what the output might look like:
Paired t-test data: scores$before and scores$after t = -2.1305, df = 29, p-value = 0.04102 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -5.5763927 -0.1042622 sample estimates: mean of the differences -2.840327
Here’s how to interpret this output:
t: The t-value is the calculated difference represented in units of standard error. The greater the magnitude of T (either positive or negative), the greater the evidence against the null hypothesis. In this case, t is -2.1305.
df: This is the degrees of freedom, which is the number of independent pieces of information that went into calculating the estimate. In this case, df is 29.
p-value: The p-value is the probability of obtaining results as extreme as the observed results of a statistical hypothesis test, assuming that the null hypothesis is correct. A small p-value (typically ≤ 0.05) indicates strong evidence against the null hypothesis. In this case, the p-value is 0.04102, which is less than 0.05, so we reject the null hypothesis.
alternative hypothesis: This is the alternative hypothesis you specified (or the default). In this case, the alternative hypothesis was that the true difference in means is not equal to 0.
95 percent confidence interval: This is a range of values, derived from the sample, that is likely to contain the population mean difference. In this case, the 95% confidence interval is between -5.576 and -0.104.
sample estimates: This is the sample mean of the differences. In this case, the mean difference between the ‘after’ scores and the ‘before’ scores is -2.840.
In conclusion, we reject the null hypothesis that the mean difference between the ‘before’ scores and the ‘after’ scores is zero. We conclude that there is a significant difference in scores before and after the training.
The paired samples t-test is a powerful tool to compare the means of two dependent groups. This article provides a step-by-step guide on how to perform a paired samples t-test in R, from checking the assumptions of the test to interpreting the results. Always remember that the results of a t-test, like any statistical test, are inferential and should be interpreted within the context of your research question and study design.