Visualizing data distributions is a key task in exploratory data analysis. Histograms and Normal curves are widely used for this purpose. A histogram provides a visual representation of data distribution by splitting it into bins of equal intervals and showcasing the frequency of data points within each bin. A Normal curve (or Gaussian distribution) is a type of continuous probability distribution for a real-valued random variable. Overlaying a Normal curve on a histogram can provide a helpful context for understanding the data distribution and assessing whether it follows a Normal distribution.
In this article, we will discuss how to create a histogram and overlay a Normal curve on it using both base R and the ggplot2
package. We will also discuss how to handle instances when the data does not follow a Normal distribution.
Using Built-in Data in R
To keep things simple, this tutorial will use the built-in mtcars
dataset in R. This dataset provides various attributes of 32 car models, including miles per gallon (mpg), number of cylinders (cyl), and horsepower (hp).
Let’s take a look at the first few rows of the dataset:
head(mtcars)
Overlaying a Normal Curve on a Histogram in Base R
Creating histograms and Normal curves in base R involves using a combination of the hist
, dnorm
, mean
, and sd
functions.
Creating a Histogram
First, let’s create a histogram for the mpg
column. The hist
function returns a list of values which we will use later, so we need to save the output:
hist_data <- hist(mtcars$mpg, main = "Histogram of MPG", xlab = "Miles Per Gallon", ylab = "Frequency", col = "lightblue", border = "black")

In this code snippet, the main
, xlab
, and ylab
parameters are used to set the title of the histogram, the x-axis label, and the y-axis label, respectively. The col
parameter is used to set the color of the bars, and border
sets the color of the border around the bars.
Overlaying a Normal Curve
To overlay a Normal curve, we first need to calculate the mean (mean
) and standard deviation (sd
) of the data. The dnorm
function is then used to generate the y-coordinates of the Normal curve based on these values:
mean_mpg <- mean(mtcars$mpg)
sd_mpg <- sd(mtcars$mpg)
curve_density <- dnorm(hist_data$mids, mean = mean_mpg, sd = sd_mpg)
Finally, we add the Normal curve to the histogram using the lines
function. We need to adjust the y-coordinates of the curve to match the scale of the histogram, which we do by multiplying the density by the binwidth and the count of observations:
curve_height <- curve_density * diff(hist_data$mids[1:2]) * length(mtcars$mpg)
lines(hist_data$mids, curve_height, col = "darkblue", lwd = 2)

The col
and lwd
parameters in the lines
function set the color and line width of the Normal curve, respectively.
Overlaying a Normal Curve on a Histogram with ggplot2
While base R provides the necessary functionality, the ggplot2
package can create more aesthetically pleasing and customizable graphics. To use ggplot2
, you first need to install and load it into your R environment:
install.packages("ggplot2")
library(ggplot2)
Creating a Histogram
The ggplot
function initializes a ggplot object, and the geom_histogram
function adds a histogram layer:
ggplot(mtcars, aes(x = mpg)) +
geom_histogram(aes(y = ..density..), colour = "black", fill = "lightblue", bins = 30) +
labs(title = "Histogram of MPG with Normal Curve", x = "Miles Per Gallon", y = "Density")

The aes
function maps the mpg
variable to the x-axis, and y = ..density..
sets the y-axis to represent density rather than frequency. The colour
, fill
, and bins
parameters in geom_histogram
set the border color, fill color, and number of bins, respectively.
Overlaying a Normal Curve
The geom_density
and stat_function
functions are used to overlay a Normal curve:
ggplot(mtcars, aes(x = mpg)) +
geom_histogram(aes(y = ..density..), colour = "black", fill = "lightblue", bins = 30) +
geom_density(colour = "darkblue", lwd = 1.5) +
stat_function(fun = dnorm, args = list(mean = mean(mtcars$mpg), sd = sd(mtcars$mpg)), colour = "red", lwd = 1.5) +
labs(title = "Histogram of MPG with Normal Curve", x = "Miles Per Gallon", y = "Density")

geom_density
adds a density plot based on the mpg
data, and stat_function
adds a theoretical Normal distribution based on the calculated mean and standard deviation of the mpg
data. The colour
and lwd
parameters set the color and line width of the curves.
Conclusion
Overlaying a Normal curve on a histogram is a common task when exploring data distributions. Both base R and ggplot2
offer robust functionality to create these plots, with ggplot2
offering more customization options.