How to Apply Bayes’ Theorem in Python

Spread the love

Bayes’ Theorem is a fundamental principle in the field of statistics and probability, and it’s used to update probabilities based on new data. Named after the statistician Thomas Bayes, the theorem provides a way to revise existing predictions or hypotheses given new or additional evidence.

In this article, we’ll guide you through the process of understanding and applying Bayes’ Theorem in Python. This includes theoretical understanding, basic implementation in Python, and real-world applications with the help of the Python library, pomegranate.

Understanding Bayes’ Theorem

Before diving into Python implementations, it’s crucial to understand the theorem itself.

Bayes’ Theorem is used to reverse conditional probabilities. It is expressed as:

P(A|B) = [P(B|A) * P(A)] / P(B)

Here:

  • P(A|B) is the posterior probability of event A given event B. It’s what we’re trying to calculate.
  • P(B|A) is the likelihood of event B given event A has occurred.
  • P(A) is the prior probability of event A, the initial degree of belief in A.
  • P(B) is the evidence or total probability of event B.

Calculating Bayes’ Theorem in Python

Let’s start with a simple example. Suppose you are playing a game where you have a bag with five green balls and three red balls. If a ball is green, there’s a 75% chance that you’ll win. If a ball is red, you have a 60% chance of losing. If you’ve won the game, what’s the probability that you’ve chosen a green ball?

This can be calculated using Bayes’ Theorem.

# Prior probabilities
P_G = 5 / 8  # Probability of choosing a green ball
P_R = 3 / 8  # Probability of choosing a red ball

# Likelihoods
P_W_G = 0.75  # Probability of winning given that a green ball was chosen
P_W_R = 0.4   # Probability of winning given that a red ball was chosen (1 - 0.6 because it's a winning probability)

# Total probability of winning
P_W = P_W_G * P_G + P_W_R * P_R

# Posterior probability: Probability of choosing a green ball given that you won
P_G_W = (P_W_G * P_G) / P_W
print(P_G_W)  # Output: 0.8823529411764706

In this example, the probability of having chosen a green ball given that you’ve won is approximately 0.882, or 88.2%.

Applying Bayes’ Theorem to Real-world Scenarios

Let’s take a more complex scenario and implement Bayes’ Theorem using Python’s pomegranate library, which provides a practical way to create Bayesian Networks.

Consider a doctor’s clinic where patients come with symptoms of either a common cold or the flu. The doctor uses two tests (a blood test and a swab test) to diagnose the disease, each having a certain degree of accuracy.

Firstly, let’s install the pomegranate library using pip:

pip install pomegranate

Now, let’s implement the Bayesian Network for this scenario:

from pomegranate import *

# Define the prior probability (the disease)
disease = DiscreteDistribution({'cold': 0.7, 'flu': 0.3})

# Define the conditional probability distributions (the tests)
blood_test = ConditionalProbabilityTable(
    [['cold', 'negative', 0.8],
     ['cold', 'positive', 0.2],
     ['flu', 'negative', 0.35],
     ['flu', 'positive', 0.65]], [disease])

swab_test = ConditionalProbabilityTable(
    [['cold', 'negative', 0.6],
     ['cold', 'positive', 0.4],
     ['flu', 'negative', 0.25],
     ['flu', 'positive', 0.75]], [disease])

# Create states
s1 = State(disease, name='disease')
s2 = State(blood_test, name='blood_test')
s3 = State(swab_test, name='swab_test')

# Create and initialize the Bayesian network
network = BayesianNetwork('Diagnosing the Flu')
network.add_states(s1, s2, s3)
network.add_edge(s1, s2)
network.add_edge(s1, s3)
network.bake()

In this code, we’re creating a Bayesian network with three states: disease, blood test, and swab test. The disease state is the prior, and the tests are conditional probabilities depending on the disease state.

Now let’s calculate some posterior probabilities. For instance, if a patient has a positive result in both tests, what is the probability of them having a flu?

# Compute the probability
prob_flu = network.probability([['flu', 'positive', 'positive']])
prob_cold = network.probability([['cold', 'positive', 'positive']])

# Use Bayes' theorem to find the probability of flu given positive results
prob_disease_given_positive_tests = prob_flu / (prob_flu + prob_cold)

print(prob_disease_given_positive_tests)  # Output: 0.6122448979591837

In this scenario, the probability of a patient having the flu given they tested positive in both tests is approximately 61.2%.

Conclusion

Bayes’ Theorem is a fundamental tool in statistical inference, allowing us to update our beliefs in the light of new evidence. Python, with its powerful and easy-to-use libraries, provides an efficient way to work with Bayesian networks and conditional probabilities.

Although the theorem itself is relatively simple, applying it to real-world scenarios can become complex, depending on the number of variables and conditions involved. Thus, understanding the theory behind it and how to correctly interpret the results is crucial in fields such as machine learning, data science, and decision-making.

Leave a Reply