The normal distribution, also known as the Gaussian distribution, is a continuous probability distribution that is symmetric about the mean, showing that data near the mean are more frequent in occurrence than data far from the mean. The normal distribution is widely used in the natural and social sciences as a simple model for complex random variables.
In R, one of the functions related to the normal distribution is the Cumulative Distribution Function (CDF), denoted as
pnorm(). The CDF calculates the probability that a random variable is less than a threshold value. This article will guide you through the usage of the Normal CDF function in R.
1. Basic Use of the Normal CDF
The basic use of the
pnorm() function in R is to calculate the probability that a normally distributed random variable will take on a value less than a given value. The syntax is
pnorm(q, mean = 0, sd = 1, lower.tail = TRUE, log.p = FALSE), where:
q: the value at which the CDF is evaluated,
mean: the mean of the normal distribution,
sd: the standard deviation of the normal distribution,
lower.tail: if TRUE (default), probabilities are P[X ≤ x], otherwise, P[X > x],
log.p: if TRUE, probabilities p are given as log(p).
For example, if you wanted to find the probability that a standard normal random variable (mean 0, standard deviation 1) is less than 1.96, you would use the
pnorm() function as follows:
pnorm(1.96, mean = 0, sd = 1)
The output is approximately 0.975, meaning there is a 97.5% chance that a standard normally distributed random variable will take on a value less than 1.96.
2. Visualizing the Normal CDF
pnorm() function can also be used in conjunction with R’s plotting functions to visualize the CDF of a normal distribution. The following code generates a plot of the CDF of a standard normal distribution.
# Generate sequence of x values x <- seq(-4, 4, length.out = 1000) # Calculate corresponding CDF values cdf <- pnorm(x) # Plot the CDF plot(x, cdf, type = "l", main = "CDF of a Standard Normal Distribution", xlab = "Value", ylab = "Cumulative Probability", las = 1)
This code first generates a sequence of x values ranging from -4 to 4 using the
seq() function. It then uses the
pnorm() function to calculate the corresponding CDF values for each of these x values. Finally, it uses the
plot() function to generate a line plot of the CDF.
3. Applying the Normal CDF in Hypothesis Testing
pnorm() function is frequently used in hypothesis testing, particularly in the construction of confidence intervals and in the calculation of p-values.
For example, consider a one-sample t-test, where you are testing the null hypothesis that the mean of a population is equal to a given value, against the alternative hypothesis that the mean is not equal to that value. If the sample mean is
mean, the sample size is
n, the sample standard deviation is
s, and the hypothesized population mean is
mu, the test statistic is calculated as follows:
data <- rnorm(100) # assuming you have a dataset of 100 observations from a normal distribution x_bar <- mean(data) # calculate the mean of your data mu <- 0 # hypothesized population mean s <- sd(data) # calculate the standard deviation of your data n <- length(data) # get the number of data points t <- (x_bar - mu) / (s / sqrt(n)) # calculate the test statistic
The p-value can then be calculated using the
p_value <- 2 * (1 - pnorm(abs(t)))
2 * (1 - pnorm(abs(t))) is used instead of
pnorm(t) because this is a two-tailed test, meaning that we are interested in the probability that the test statistic is greater than
t or less than
4. Normal CDF and Normal Quantile Function
In addition to
pnorm(), R provides the
qnorm() function, which is the inverse of
pnorm(). This is known as the quantile function, or the percent-point function. Given a probability
qnorm(p) returns the value
x such that
pnorm(x) is equal to
For example, if you wanted to find the value such that the probability of a standard normal random variable being less than that value is 0.975, you would use the
The result is approximately 1.96, which is the same value we used as an example with
pnorm() earlier. Thus, you can see that
pnorm() are inverse functions.
In conclusion, the
pnorm() function in R is a powerful tool for working with normal distributions. It allows you to calculate the cumulative probability up to a certain value for normally distributed data. Whether you’re performing statistical analysis, hypothesis testing, or simply learning the basics of probability and statistics, the Normal CDF function is a critical function in the R language.