A confusion matrix is a critical tool for evaluating the performance of a classification algorithm. It is essentially a table that describes the performance of a classification model on a set of data for which the true values are known. The matrix provides a summarized view of the four essential metrics: True Positives (TP), True Negatives (TN), False Positives (FP), and False Negatives (FN).
In the R programming environment, you can easily create, visualize, and analyze confusion matrices. In this comprehensive article, we’ll walk through how to create a confusion matrix in R using built-in functions, external libraries, and custom code. We will also delve into interpreting and visualizing the matrix to gain meaningful insights.
Table of Contents
- Pre-requisites
- Setting Up the Data
- Using Base R Functions to Create a Confusion Matrix
- Using the
caret
Package - Using the
confusionMatrix
Function incaret
- Custom Confusion Matrix Function
- Visualizing the Confusion Matrix
- Interpreting the Confusion Matrix
- Conclusion
1. Pre-requisites
Before we dive in, make sure you have the following:
- R installed on your machine.
- RStudio or another preferred IDE.
- Basic understanding of R syntax and programming concepts.
- Familiarity with classification algorithms and machine learning concepts.
If you don’t have the required packages installed, you can install them using the install.packages()
function.
2. Setting Up the Data
For demonstration, we’ll use the iris
dataset that comes with R by default. This dataset is often used for classification problems. It contains data about the characteristics of three species of iris flowers.
Here’s how you can load the iris
dataset:
data(iris)
head(iris)
3. Using Base R Functions to Create a Confusion Matrix
You can use the table()
function in base R to create a confusion matrix easily:
# Split the data
set.seed(42)
sample_index <- sample(seq_len(nrow(iris)), size = 0.7 * nrow(iris))
train_data <- iris[sample_index,]
test_data <- iris[-sample_index,]
# Train a model
model <- glm(Species ~ Sepal.Length + Sepal.Width + Petal.Length + Petal.Width, data = train_data, family = "binomial")
# Make predictions
predictions <- predict(model, newdata = test_data, type = "response")
predictions <- ifelse(predictions > 0.5, "setosa", "versicolor")
# Create the confusion matrix
conf_matrix <- table(test_data$Species, predictions)
print(conf_matrix)
4. Using the caret Package
Another powerful package for creating confusion matrices in R is caret
. You can install it as follows:
install.packages("caret")
5. Using the confusionMatrix Function in caret
Here’s how you can create a confusion matrix using caret
:
library(caret)
conf_matrix_caret <- confusionMatrix(as.factor(predictions), as.factor(test_data$Species))
print(conf_matrix_caret)
6. Custom Confusion Matrix Function
If you want more control over how the confusion matrix is created, you can write a custom function:
custom_conf_matrix <- function(true_labels, predicted_labels) {
table(true_labels, predicted_labels)
}
7. Visualizing the Confusion Matrix
You can visualize the confusion matrix using packages like ggplot2
or corrplot
for more insightful interpretation.
# Load the required packages
library(ggplot2)
library(reshape2)
# Create a confusion matrix using the table() function for demonstration
# Replace this with your actual confusion matrix
conf_matrix <- as.table(matrix(c(5, 3, 2, 4), nrow = 2))
# Convert the confusion matrix into a tidy data frame
conf_melt <- as.data.frame(as.table(conf_matrix))
# Plot using ggplot2
ggplot(data = conf_melt, aes(x = Var1, y = Var2)) +
geom_tile(aes(fill = Freq), color = 'white') +
geom_text(aes(label = sprintf("%d", Freq)), vjust = 1) +
scale_fill_gradient(low = "white", high = "blue") +
theme_minimal() +
labs(fill = "Frequency")

In this example, we use the as.table
and as.data.frame
functions to get the confusion matrix into a tidy format. Then, we plot it using ggplot2.
Note: The variable names Var1
and Var2
are default names generated when you convert a table to a data frame. They represent the reference classes and predicted classes, respectively. Replace them as needed for better understanding.
8. Interpreting the Confusion Matrix
Here’s a quick overview of what each cell in a confusion matrix represents:
- True Positives (TP): Actual positives correctly classified.
- True Negatives (TN): Actual negatives correctly classified.
- False Positives (FP): Actual negatives wrongly classified as positives.
- False Negatives (FN): Actual positives wrongly classified as negatives.
9. Conclusion
Creating and interpreting a confusion matrix in R can be done using base R functions or specialized packages like caret
. The confusion matrix is a potent tool for evaluating the performance of classification models and can be easily visualized for better understanding and presentation.
Understanding how to work with confusion matrices is crucial for anyone dealing with classification problems in machine learning, as it provides a foundation for calculating other performance metrics like precision, recall, and F1-score.