How to Plot a Confidence Interval in R

Spread the love

One of the key statistical concepts that R is capable of handling efficiently is confidence intervals. Confidence intervals (CI) provide a range of values, derived from a data set, that is likely to contain the value of an unknown population parameter. In this guide, we are going to learn how to plot confidence intervals using R.

Understanding Confidence Intervals

Before we delve into the coding part, it’s crucial to understand what confidence intervals are. Confidence intervals are an interval estimate provided by a statistical method that is likely to include an unknown population parameter. The interval has an associated confidence level that quantifies the level of confidence that the parameter lies in the interval.

Consider an example where we have collected a sample of data from a larger population. If we calculate a 95% confidence interval for the mean of that population, it means that if we repeated our study 100 times, we would expect the true population mean to fall within our calculated interval approximately 95 times out of 100.

Using Basic R Functions to Calculate Confidence Intervals

To start with, we can use some of R’s built-in functions to calculate confidence intervals. The t.test() function is one of them. For example, consider a simple numeric vector in R:

data <- c(6, 9, 13, 7, 12, 8, 11, 9, 10, 11)

You can calculate the confidence interval for the mean of this data set using the t.test() function:

t.test(data)$conf.int

This will give you a confidence interval for the mean of the data set, by default at a 95% confidence level.

Loading Libraries

The next step is to load the necessary R packages. We will use the ggplot2 package to make our plots. If you don’t have the package installed, you can do so using the install.packages function, then load it with the library function:

install.packages("ggplot2")
library(ggplot2)

Importing and Inspecting Data

For this guide, we will use the mtcars data set, which comes built-in with R. To inspect the data set, simply type:

head(mtcars)

This command shows the first six rows of the data set.

Calculating Confidence Intervals

Now, let’s calculate a 95% confidence interval for the mean miles per gallon (mpg) in the mtcars data set.

CI <- t.test(mtcars$mpg)$conf.int
CI

This returns the lower and upper bounds of the 95% confidence interval.

Plotting Confidence Intervals

The ggplot2 package is excellent for creating beautiful and customizable graphics. Here is how you can use it to create a plot with a confidence interval:

mean_mpg <- mean(mtcars$mpg)

ggplot(mtcars, aes(x = 1, y = mpg)) +
  geom_point(position = position_jitter(h = 0, w = 0.1), size = 2) +
  geom_errorbar(aes(ymin = CI[1], ymax = CI[2]), width = 0.2, colour = "red", size = 1) +
  geom_point(aes(y = mean_mpg), colour = "blue", size = 3) +
  theme_void() +
  labs(title = "95% Confidence Interval for MPG",
       subtitle = paste("Mean MPG:", round(mean_mpg, 2)),
       x = "", y = "Miles Per Gallon (MPG)")

In this code:

  • geom_point plots the individual data points. position_jitter is used to spread out the points to reduce overlap.
  • geom_errorbar adds the confidence interval to the plot.
  • theme_void removes axes and labels for a minimalistic plot.
  • labs adds a title and subtitle.

The resulting plot shows the individual data points, the mean (blue point), and the 95% confidence interval (red bar).

Plotting Confidence Intervals for Grouped Data

Often we have data divided into different groups or categories, and we want to compare the confidence intervals for these groups. For this, we need to calculate and plot confidence intervals for each group separately.

For example, in the mtcars data set, cars are divided into automatic and manual transmissions (represented by the am variable). Let’s calculate and plot the 95% confidence intervals for the mpg of these two groups.

# Calculate mean and confidence interval by group
library(dplyr)

summary <- mtcars %>%
  group_by(am) %>%
  summarise(mean_mpg = mean(mpg),
            CI = list(as.numeric(t.test(mpg)$conf.int)),
            .groups = 'drop')

# Make plot
ggplot(summary, aes(x = factor(am), y = mean_mpg)) +
  geom_point(size = 3, position = position_dodge(0.2)) +
  geom_errorbar(aes(ymin = CI[[1]][1], ymax = CI[[1]][2]), width = 0.2, position = position_dodge(0.2)) +
  labs(x = "Transmission (0 = automatic, 1 = manual)",
       y = "Miles Per Gallon (MPG)",
       title = "95% Confidence Interval for MPG by Transmission")

In this code, we used the group_by and summarise functions from the dplyr package to calculate the means and confidence intervals by group. Then we plotted the results similarly as before, but with two groups. The resulting plot allows us to compare the confidence intervals for automatic and manual transmission cars.

Conclusion

In this guide, we discussed what confidence intervals are and how to plot them using R. We started by using basic R functions to calculate confidence intervals, then introduced the ggplot2 package to create plots. We also covered how to handle grouped data.

Posted in RTagged

Leave a Reply