One of the key statistical concepts that R is capable of handling efficiently is confidence intervals. Confidence intervals (CI) provide a range of values, derived from a data set, that is likely to contain the value of an unknown population parameter. In this guide, we are going to learn how to plot confidence intervals using R.
Understanding Confidence Intervals
Before we delve into the coding part, it’s crucial to understand what confidence intervals are. Confidence intervals are an interval estimate provided by a statistical method that is likely to include an unknown population parameter. The interval has an associated confidence level that quantifies the level of confidence that the parameter lies in the interval.
Consider an example where we have collected a sample of data from a larger population. If we calculate a 95% confidence interval for the mean of that population, it means that if we repeated our study 100 times, we would expect the true population mean to fall within our calculated interval approximately 95 times out of 100.
Using Basic R Functions to Calculate Confidence Intervals
To start with, we can use some of R’s built-in functions to calculate confidence intervals. The
t.test() function is one of them. For example, consider a simple numeric vector in R:
data <- c(6, 9, 13, 7, 12, 8, 11, 9, 10, 11)
You can calculate the confidence interval for the mean of this data set using the
This will give you a confidence interval for the mean of the data set, by default at a 95% confidence level.
The next step is to load the necessary R packages. We will use the
ggplot2 package to make our plots. If you don’t have the package installed, you can do so using the
install.packages function, then load it with the
Importing and Inspecting Data
For this guide, we will use the
mtcars data set, which comes built-in with R. To inspect the data set, simply type:
This command shows the first six rows of the data set.
Calculating Confidence Intervals
Now, let’s calculate a 95% confidence interval for the mean miles per gallon (mpg) in the
mtcars data set.
CI <- t.test(mtcars$mpg)$conf.int CI
This returns the lower and upper bounds of the 95% confidence interval.
Plotting Confidence Intervals
ggplot2 package is excellent for creating beautiful and customizable graphics. Here is how you can use it to create a plot with a confidence interval:
mean_mpg <- mean(mtcars$mpg) ggplot(mtcars, aes(x = 1, y = mpg)) + geom_point(position = position_jitter(h = 0, w = 0.1), size = 2) + geom_errorbar(aes(ymin = CI, ymax = CI), width = 0.2, colour = "red", size = 1) + geom_point(aes(y = mean_mpg), colour = "blue", size = 3) + theme_void() + labs(title = "95% Confidence Interval for MPG", subtitle = paste("Mean MPG:", round(mean_mpg, 2)), x = "", y = "Miles Per Gallon (MPG)")
In this code:
geom_pointplots the individual data points.
position_jitteris used to spread out the points to reduce overlap.
geom_errorbaradds the confidence interval to the plot.
theme_voidremoves axes and labels for a minimalistic plot.
labsadds a title and subtitle.
The resulting plot shows the individual data points, the mean (blue point), and the 95% confidence interval (red bar).
Plotting Confidence Intervals for Grouped Data
Often we have data divided into different groups or categories, and we want to compare the confidence intervals for these groups. For this, we need to calculate and plot confidence intervals for each group separately.
For example, in the
mtcars data set, cars are divided into automatic and manual transmissions (represented by the
am variable). Let’s calculate and plot the 95% confidence intervals for the mpg of these two groups.
# Calculate mean and confidence interval by group library(dplyr) summary <- mtcars %>% group_by(am) %>% summarise(mean_mpg = mean(mpg), CI = list(as.numeric(t.test(mpg)$conf.int)), .groups = 'drop') # Make plot ggplot(summary, aes(x = factor(am), y = mean_mpg)) + geom_point(size = 3, position = position_dodge(0.2)) + geom_errorbar(aes(ymin = CI[], ymax = CI[]), width = 0.2, position = position_dodge(0.2)) + labs(x = "Transmission (0 = automatic, 1 = manual)", y = "Miles Per Gallon (MPG)", title = "95% Confidence Interval for MPG by Transmission")
In this code, we used the
summarise functions from the
dplyr package to calculate the means and confidence intervals by group. Then we plotted the results similarly as before, but with two groups. The resulting plot allows us to compare the confidence intervals for automatic and manual transmission cars.
In this guide, we discussed what confidence intervals are and how to plot them using R. We started by using basic R functions to calculate confidence intervals, then introduced the
ggplot2 package to create plots. We also covered how to handle grouped data.