How to Plot Multiple Histograms in R

Spread the love

Histograms are a staple in the realm of data visualization. They provide a graphical representation of the distribution of a dataset. More specifically, histograms represent the data’s frequency distribution, that is, the number of observations that fall into each of the several categories known as bins.

Often in data analysis and visualization, there is a need to compare the distribution of two or more different variables. In such cases, creating multiple histograms becomes extremely useful. In R, one of the most popular programming languages for statistical analysis, there are several methods for creating histograms, including the basic hist() function in base R, and more advanced functions in packages like ggplot2 and lattice.

This article will walk you through the process of plotting multiple histograms on one plot in R. We’ll cover two main approaches: one using base R, and the other using the ggplot2 package.

Generating Some Example Data

For this tutorial, let’s generate two different sets of data to create our histograms. We will use the rnorm() function in R to generate two sets of 1000 random numbers, one with a mean of 0 and standard deviation of 1, and another with a mean of 2 and standard deviation of 1.5.

set.seed(123) # for reproducible results

data1 <- rnorm(1000, mean = 0, sd = 1)
data2 <- rnorm(1000, mean = 2, sd = 1.5)

Creating Multiple Histograms Using Base R

To plot multiple histograms in the same plot, we can use the hist() function in R. This function creates a histogram by taking a vector of values as input and dividing it into bins to create the histogram.Here is an example of how to use this function to plot two histograms together:

# Create the first histogram
hist(data1, col = rgb(0, 0, 1, 0.5), main = "Multiple Histograms", 
     xlab = "Values", ylab = "Frequency", xlim = c(-5, 5), ylim = c(0, 350), 
     breaks = 30)

# Add the second histogram to the plot
hist(data2, col = rgb(1, 0, 0, 0.5), add = TRUE, breaks = 30)

In the above code, col = rgb(0, 0, 1, 0.5) and col = rgb(1, 0, 0, 0.5) set the color of the histograms. The fourth argument to the rgb() function sets the transparency to allow for better visibility when the histograms overlap. The argument add = TRUE is used to add the second histogram to the current plot. The breaks argument sets the number of bins.

The xlim and ylim arguments in the first hist() function are used to set the x and y limits of the plot. This ensures that both histograms will fit into the plot area.

Creating Multiple Histograms Using ggplot2

First, we need to combine our data into one data frame and create a grouping variable:

# Load the ggplot2 package
library(ggplot2)

# Combine the data
data <- data.frame(
  value = c(data1, data2),
  group = factor(c(rep("data1", length(data1)), rep("data2", length(data2))))
)

head(data) # Take a look at the first few rows of the combined data

Now, we can create the histograms:

ggplot(data, aes(x = value, fill = group)) +
  geom_histogram(position = "identity", alpha = 0.5, bins = 30) +
  labs(x = "Values", y = "Frequency") +
  theme_minimal()

Here, aes(x = value, fill = group) is setting the variable for the x-axis and the variable that will determine the fill color of the histogram bars. The position = "identity" argument in geom_histogram() allows the histograms to overlap, and alpha = 0.5 sets the transparency.

Customizing the Histograms

Both the hist() function and ggplot2 offer a variety of customization options to fine-tune the appearance of your histograms. For instance, you could change the bin width, color scheme, add a legend, change the theme, and more.

Here’s an example of a more customized ggplot2 histogram:

ggplot(data, aes(x = value, fill = group)) +
  geom_histogram(position = "identity", alpha = 0.5, bins = 30) +
  scale_fill_manual(values = c("data1" = "blue", "data2" = "red")) +
  labs(title = "Multiple Histograms", x = "Values", y = "Frequency") +
  theme_minimal() +
  theme(legend.position = "top")

In the above code, scale_fill_manual(values = c("data1" = "blue", "data2" = "red")) changes the color of the histograms, labs() adds titles, and theme() is used to adjust the theme and the legend position.

Conclusion

Histograms are a fundamental tool in the statistical analysis and visualization of data. They allow us to visualize and understand the distribution of a dataset. In R, we can leverage the power of the base R hist() function or the ggplot2 package to plot and customize multiple histograms in one plot.

Posted in RTagged

Leave a Reply