This article will guide you through the process of plotting a log-normal distribution in R. To provide a comprehensive understanding, we’ll divide the guide into several sections, including an introduction to log-normal distributions, data generation and plotting in R, and, finally, an overview of interpretative techniques.
Introduction to Log-Normal Distributions
Before we delve into the plotting process, let’s understand what a log-normal distribution is.
A log-normal (or lognormal) distribution is a probability distribution of a random variable whose logarithm is normally distributed. If Y is a random variable with a normal distribution, then X = exp(Y) has a log-normal distribution. Log-normal distributions can model variables that are positive-valued and have a skewed distribution. It is commonly used in various domains such as economics, biology, and engineering.
The distribution is characterized by two parameters – mu and sigma. Mu is the mean of the logarithmic values of the distribution, and sigma is the standard deviation of these logarithmic values. The shape of the log-normal distribution is entirely defined by these parameters.
Generating a Log-Normal Distribution in R
The first step in visualizing a log-normal distribution is to generate one. In R, we can create a log-normal distribution using the rlnorm()
function, which generates random deviates.
Here is an example of generating a log-normal distribution:
# Set seed for reproducibility
set.seed(123)
# Generate 1000 log-normal random variables
log_normal <- rlnorm(1000, meanlog = 0, sdlog = 1)
# Inspect the first 10 elements
head(log_normal, 10)
In this code, we’re generating 1000 random variables from a log-normal distribution with meanlog
(mu) of 0 and sdlog
(sigma) of 1.
The set.seed()
function ensures that the generated random numbers are reproducible. This is important for experimental consistency, as without setting a seed, each run could produce different results.
Plotting a Log-Normal Distribution in R
Once we have the log-normal distribution, we can proceed to visualize it using R’s built-in functions. Let’s create a histogram and a density plot.
Histogram
A histogram provides a visual representation of data distribution. Here is how you can plot a histogram of your log-normal data:
# Plot histogram
hist(log_normal, main="Histogram of Log-Normal Distribution", xlab="Value", ylab="Frequency", col="lightblue", border="black")

This code generates a histogram for our log-normal distribution, with the x-axis representing the value of the random variable and the y-axis indicating the frequency of occurrence.
Density Plot
While a histogram provides a basic visual representation, a density plot can give a smoother and more visually intuitive understanding of the distribution. We can use the density()
function to estimate the density function and then plot it using the plot()
function:
# Estimate density
log_normal_density <- density(log_normal)
# Plot density
plot(log_normal_density, main="Density Plot of Log-Normal Distribution", xlab="Value", ylab="Density", col="darkblue")

This code creates a density plot for our log-normal distribution. The x-axis represents the value of the random variable, and the y-axis shows the estimated density of these values.
Adding a Theoretical Curve
To ensure that our data follows a log-normal distribution, we can overlay a theoretical log-normal distribution curve onto our histogram or density plot. This can be achieved using the dlnorm()
function, which gives the density of the log-normal distribution for a sequence of values.
# Create sequence of values
x_values <- seq(min(log_normal), max(log_normal), length.out = 1000)
# Calculate density of theoretical distribution
y_values <- dlnorm(x_values, meanlog = 0, sdlog = 1)
# Plot histogram with theoretical curve
hist(log_normal, freq=FALSE, main="Histogram with Theoretical Curve", xlab="Value", ylab="Density", col="lightblue", border="black")
lines(x_values, y_values, col="darkred", lwd=2)

In this code, freq=FALSE
is used to plot densities instead of frequencies in the histogram. The lines()
function adds a theoretical log-normal distribution curve to the histogram.
Interpreting the Plots
Once you’ve plotted the log-normal distribution, it’s crucial to understand what the plot signifies.
For a histogram, the bars’ height indicates the number of observations that fell into each bin. A bar’s range on the x-axis gives you the values those observations took on. In a density plot, the y-axis indicates the probability density of each value on the x-axis. It’s a smoothed version of the histogram, which can provide a more accurate view of data distribution.
The overlay of the theoretical curve is a visual aid to assess how well your data align with a log-normal distribution. If the data follows the curve closely, it likely adheres to a log-normal distribution.
Conclusion
Plotting a log-normal distribution in R is a straightforward process once you understand the fundamental principles behind it. This article has walked you through the generation of log-normal data and the creation of a histogram, density plot, and overlaying theoretical curve. Remember that interpreting the plot is as important as generating it. Always compare your plots with the theoretical distribution to better understand your data.