How to Calculate Kullback-Leibler Divergence in R

Spread the love

Kullback-Leibler (KL) Divergence, also known as relative entropy, is a measure of how one probability distribution diverges from a second, expected probability distribution. It is widely used in fields such as machine learning, data mining, information retrieval, and bioinformatics.

This article provides a comprehensive guide on understanding KL Divergence and how to compute it in R. We’ll also look at practical examples of its use and application in various fields.

Understanding Kullback-Leibler Divergence

Kullback-Leibler Divergence is a non-symmetric measure of the difference between two probability distributions P and Q. KL Divergence is non-negative and is zero if and only if P and Q are the same distribution in the case of discrete variables, or equal “almost everywhere” in the case of continuous variables.

The KL Divergence of Q from P, denoted DKL(P || Q), is defined as:

DKL(P || Q) = ∑ P(x) log (P(x) / Q(x))   for discrete distributions
DKL(P || Q) = ∫ P(x) log (P(x) / Q(x)) dx for continuous distributions

Calculating KL Divergence in R

Even though R does not provide a built-in function to calculate KL Divergence directly, we can easily calculate it using the definition and the vectorized operations available in R. Let’s create a function kl_divergence() to do this.

kl_divergence <- function(P, Q) {
  sum(P * log(P / Q), na.rm = TRUE)

Here, P and Q are vectors representing the probability distributions we want to compare. This function calculates the KL Divergence of Q from P (DKL(P || Q)). Note that the na.rm = TRUE argument in the sum() function is used to remove any NA values resulting from 0/0 during the division.

Let’s use an example to demonstrate how to use this function.

# Define two probability distributions P and Q
P <- c(0.1, 0.2, 0.7)
Q <- c(0.2, 0.3, 0.5)

# Calculate KL Divergence
kl_divergence(P, Q)

# Output:
# [1] 0.09151622

Dealing with Zeros

The basic formula for KL Divergence involves the logarithm of the ratio P(x) / Q(x), which can be problematic when P(x) or Q(x) contains zero probabilities. A common practice to deal with this issue is to add a small constant to the probabilities to offset the zero values, a process known as “smoothing”. Below is a revised version of the kl_divergence() function that includes smoothing:

kl_divergence_smooth <- function(P, Q, epsilon = 1e-10) {
  # Apply smoothing
  P <- P + epsilon
  Q <- Q + epsilon

  # Calculate KL Divergence
  sum(P * log(P / Q), na.rm = TRUE)

This version of the function adds a small value epsilon to each probability in P and Q to ensure that we don’t encounter division by zero or taking the log of zero, both of which are undefined operations.

Applications of KL Divergence

Kullback-Leibler Divergence has various practical applications in numerous fields:

  1. Machine Learning: In machine learning, KL Divergence is used in methods such as t-Distributed Stochastic Neighbor Embedding (t-SNE) for dimensionality reduction, and in algorithms such as Expectation-Maximization. It’s also used as a loss function in Variational Autoencoders (VAEs).
  2. Information Theory: In information theory, KL Divergence measures the loss of information when one distribution is used to approximate another.
  3. Natural Language Processing (NLP): In NLP, KL Divergence is used in techniques like Latent Dirichlet Allocation (LDA) for topic modeling, and in text summarization algorithms.
  4. Bioinformatics: In bioinformatics, KL Divergence can be used to measure the divergence between the observed nucleotide frequencies and the expected frequencies in a DNA sequence.


While R does not directly provide a built-in function to compute Kullback-Leibler Divergence, it offers all the tools necessary to calculate it effectively. This measure of divergence is a fundamental concept in statistics and information theory and finds application in various fields where comparison of probability distributions is vital.

Understanding KL Divergence and its computation in R equips us with a crucial statistical tool that helps us quantify the difference between probability distributions, facilitating efficient decision-making in data analysis, machine learning, and many other areas.

Posted in RTagged

Leave a Reply