# How to Create a Relative Frequency Histogram in R

In statistics, a histogram is an efficient graphical representation of the distribution of numerical data. It is an estimate of the probability distribution of a continuous variable. When we construct a histogram, it’s quite common to normalize it to give a relative frequency histogram, which shows the proportion of the dataset that falls within each bin.

This article will guide you through the process of creating a relative frequency histogram in R, using both the base R package and the ggplot2 package. We will also cover some advanced options for customization, enabling you to create highly detailed and useful plots.

## 1. Understanding Histograms and Relative Frequencies

In a standard histogram, the y-axis represents the absolute frequency of data points within each bin. Bins are defined by dividing the range of data into equal intervals.

However, in a relative frequency histogram, the y-axis represents the proportion of total observations that fall within each bin. Thus, the sum of the bar areas in a relative frequency histogram equals 1. This can be very helpful when you want to compare distributions with differing numbers of observations.

## 2. Creating a Relative Frequency Histogram in Base R

Creating a relative frequency histogram in base R involves a two-step process: generating the histogram and then modifying it to display relative frequencies. Here’s how to do it:

# Generate a sample dataset
data <- rnorm(1000)

# Create a histogram object
h <- hist(data, plot = FALSE)

# Transform counts into relative frequencies
h$counts <- h$counts / sum(h$counts) # Plot the relative frequency histogram plot(h, freq = FALSE, main = "Relative Frequency Histogram", xlab = "Bins", ylab = "Relative Frequency") Here’s what happens in this code: 1. rnorm(1000) generates a normal distribution of 1000 random values, which is stored in data. 2. hist(data, plot = FALSE) calculates the histogram data without plotting it and stores the result in h. 3. h$counts <- h$counts / sum(h$counts) transforms the absolute frequencies stored in h$counts into relative frequencies. 4. plot(h, freq = FALSE, main = "Relative Frequency Histogram", xlab = "Bins", ylab = "Relative Frequency") plots the relative frequency histogram. The freq = FALSE argument in plot() is necessary to indicate that the y-axis should represent densities, not frequencies. ## 3. Creating a Relative Frequency Histogram with ggplot2 ggplot2 is a powerful package for creating high-quality plots in R. It offers a more straightforward way of creating a relative frequency histogram through the ..density.. built-in variable. First, ensure that you’ve installed and loaded the ggplot2 package: install.packages("ggplot2") library(ggplot2) Then, you can create a relative frequency histogram as follows: # Generate a data frame from the dataset df <- data.frame(data) # Create a relative frequency histogram ggplot(df, aes(x = data)) + geom_histogram(aes(y = ..density..), bins = 30, color = "black", fill = "skyblue") + labs(title = "Relative Frequency Histogram", x = "Bins", y = "Relative Frequency") In this code: 1. data.frame(data) generates a data frame from the data dataset. 2. ggplot(df, aes(x = data)) initializes the ggplot2 object, specifying data as the x-axis. 3. geom_histogram(aes(y = ..density..), bins = 30, color = "black", fill = "skyblue") adds a histogram layer to the plot, where the ..density.. variable calculates the density of each bin to represent relative frequencies. The bins argument sets the number of bins, and color and fill define the outline and fill colors of the bars. 4. labs(title = "Relative Frequency Histogram", x = "Bins", y = "Relative Frequency") adds labels to the plot. ## 4. Advanced Customization of Relative Frequency Histograms Both base R and ggplot2 provide numerous options for customizing histograms, such as modifying colors, bin widths, and adding statistical overlays. Here are some examples: ### 4.1 Adding a Density Curve to a ggplot2 Histogram You can add a density curve to the histogram to visualize the estimated probability density function of the data: ggplot(df, aes(x = data)) + geom_histogram(aes(y = ..density..), bins = 30, color = "black", fill = "skyblue", alpha = 0.5) + geom_density(color = "red") + labs(title = "Relative Frequency Histogram with Density Curve", x = "Bins", y = "Relative Frequency") In this code, geom_density(color = "red") adds a density curve to the plot in red color. ### 4.2 Adding Labels to Bars in a Base R Histogram In base R, you can add frequency labels to each bar: h <- hist(data, plot = FALSE) h$counts <- h$counts / sum(h$counts)
plot(h, freq = FALSE, main = "Relative Frequency Histogram", xlab = "Bins", ylab = "Relative Frequency")

text(h$mids, h$counts, labels = round(h$counts, 2), pos = 3, cex = 0.8) The text() function adds text to the plot. h$mids and h$counts specify the x and y coordinates for the labels, labels = round(h$counts, 2) defines the labels as the rounded relative frequencies, pos = 3 places the labels above the bars, and cex = 0.8 sets the font size.