
The cumulative distribution function (CDF) of a random variable is another fundamental concept in probability theory and statistics. For a continuous distribution like the normal distribution, the CDF gives the area under the probability density function (PDF) curve to the left of a certain value. It tells you the probability that a random draw from the distribution will be less than that value.
In this article, we will illustrate how to calculate and plot the normal cumulative distribution function (CDF) in Python using the scipy.stats
library.
Import Necessary Libraries
The first step in Python is to import the necessary libraries. For this task, we need matplotlib
, numpy
, and scipy.stats
:
import matplotlib.pyplot as plt
from scipy.stats import norm
import numpy as np
Define the Parameters of the Normal Distribution
As with the PDF, the normal distribution is characterized by two parameters: the mean (mu) which determines the center of the distribution, and the standard deviation (sigma) which determines the width of the distribution. For instance, we can define a normal distribution with a mean of 0 and a standard deviation of 1:
mu = 0
sigma = 1
Generate the Values for the x-axis
We can use the numpy
function linspace
to generate an array of values for the x-axis that spans our desired range. We’ll generate values from -3sigma to 3sigma, which will capture the significant part of the distribution:
x = np.linspace(mu - 3*sigma, mu + 3*sigma, 100)
Calculate the Cumulative Distribution Function (CDF)
The scipy.stats.norm
function cdf
can be used to calculate the cumulative distribution function for each point on the x-axis:
y = norm.cdf(x, mu, sigma)
Create the Plot
We can now create the plot using matplotlib
. We use the plot
function to create a line plot of the normal cumulative distribution function:
plt.plot(x, y)
Customize the Plot
To improve the readability of the plot, we can add a title and labels for the x and y axes:
plt.title('Normal Cumulative Distribution Function')
plt.xlabel('x')
plt.ylabel('CDF(x)')
Display the Plot
Finally, we use plt.show()
to display the plot:
plt.show()
Here’s the full Python script:
import matplotlib.pyplot as plt
from scipy.stats import norm
import numpy as np
# define mean and standard deviation
mu = 0
sigma = 1
# generate x values
x = np.linspace(mu - 3*sigma, mu + 3*sigma, 100)
# calculate cumulative distribution function
y = norm.cdf(x, mu, sigma)
# create the plot
plt.plot(x, y)
# customize the plot
plt.title('Normal Cumulative Distribution Function')
plt.xlabel('x')
plt.ylabel('CDF(x)')
# display the plot
plt.show()

When you run the above script, you’ll get a plot of the Normal Cumulative Distribution Function.
Understanding the CDF can be helpful in various statistical analyses. For instance, in hypothesis testing, the p-value is calculated as the tail area of the test statistic under the null hypothesis, which can be found using the CDF.
Just like with the PDF, when analyzing actual data, you would calculate the mean and standard deviation based on your data set. You might also overlay the normal CDF with the empirical CDF of your data to see how well it fits a normal distribution.
In summary, Python provides powerful tools for calculating and visualizing the cumulative distribution function of a normal distribution, enhancing our understanding of data and aiding in statistical analysis.