How to Plot a Normal Distribution in R

Spread the love

A normal distribution, also known as the Gaussian distribution, is a probability distribution that is symmetric about the mean. It shows that data near the mean are more frequent in occurrence than data far from the mean. The shape of the normal distribution is determined by the mean and the standard deviation.

In this extensive guide, we will discuss how to simulate and plot a normal distribution in R using various visualization techniques.

1. Simulating a Normal Distribution

We will use the rnorm() function to generate random numbers from a normal distribution. The rnorm() function takes three arguments: n (number of observations), mean (mean of the distribution), and sd (standard deviation of the distribution).

Let’s simulate a normal distribution with a mean of 0 and a standard deviation of 1 and 10000 data points.

set.seed(123)  # For reproducibility
data <- rnorm(10000, mean = 0, sd = 1)

2. Plotting the Normal Distribution

Now that we have our data, let’s plot it. We will explore a few different ways of visualizing a normal distribution.

Histogram

A histogram is a simple and quick way to visualize a distribution. In ggplot2, the geom_histogram() function is used to create a histogram.

library(ggplot2)

ggplot(data.frame(data), aes(x = data)) +
  geom_histogram(aes(y = ..density..), bins = 30, color = "black", fill = "skyblue") +
  labs(x = "Data", y = "Density", title = "Histogram of Normal Distribution") +
  theme_minimal()

Density Plot

A density plot is a smoothed version of a histogram and can provide a cleaner representation of the data distribution. In ggplot2, the geom_density() function is used to create a density plot.

ggplot(data.frame(data), aes(x = data)) +
  geom_density(fill = "skyblue", color = "black") +
  labs(x = "Data", y = "Density", title = "Density Plot of Normal Distribution") +
  theme_minimal()

Q-Q Plot

A Q-Q plot (quantile-quantile plot) is a probability plot, which is a graphical method for comparing two probability distributions by plotting their quantiles against each other. In R, the qqnorm() function can be used to create a Q-Q plot.

qqnorm(data)
qqline(data)

The qqnorm() function creates a Q-Q plot, and the qqline() function adds a reference line to the plot.

Plotting the Normal Distribution Function

We can also plot the normal distribution function using the dnorm() function, which gives the density of the normal distribution for a given set of values.

x <- seq(-4, 4, by = 0.01)
y <- dnorm(x, mean = 0, sd = 1)

df <- data.frame(x, y)
ggplot(df, aes(x = x, y = y)) +
  geom_line(color = "red") +
  labs(x = "Data", y = "Density", title = "Normal Distribution Function") +
  theme_minimal()

3. Overlaying Normal Distribution Curve on a Histogram

Finally, we can overlay a normal distribution curve on a histogram to visually confirm if the data follow a normal distribution.

ggplot(data.frame(data), aes(x = data)) +
  geom_histogram(aes(y = ..density..), bins = 30, color = "black", fill = "skyblue") +
  stat_function(fun = dnorm, args = list(mean = mean(data), sd = sd(data)), color = "red") +
  labs(x = "Data", y = "Density", title = "Histogram with Normal Distribution Overlay") +
  theme_minimal()

In this script, stat_function() adds the normal distribution curve, where fun = dnorm specifies the function to use (dnorm() for the normal distribution), and args specifies the arguments to pass to the function.

Conclusion

In this comprehensive guide, we have covered various techniques to simulate and plot a normal distribution in R, including histograms, density plots, Q-Q plots, the normal distribution function, and overlaying the normal distribution curve on a histogram. Understanding and visualizing the normal distribution is a fundamental step in many statistical analyses and machine learning algorithms.

Posted in RTagged

Leave a Reply