Predictive modeling is an integral part of data analysis and machine learning. Once a predictive model has been trained, the model can be used to predict outcomes based on new or existing data. These predicted values can then be plotted against the actual values, providing a visual representation of the model’s performance and the relationship between the observed and predicted outcomes.
This extensive guide will walk you through various ways of plotting predicted values in R. We will cover methods using both base R and popular packages like ggplot2, and we will also introduce the concept of regression models and demonstrate how to visualize their predictions.
1. Generating Predicted Values
Before we can plot predicted values, we need a model to generate these predictions. For this article, we’ll use a simple linear regression model as an example. Let’s generate some random data and fit a linear regression model:
# Set seed for reproducibility set.seed(123) # Generate random data x <- rnorm(100) y <- 2*x + rnorm(100) # Fit a linear regression model model <- lm(y ~ x)
In this example,
rnorm(100) generates 100 random numbers from a standard normal distribution. We generate
y as a function of
x with some added noise.
lm(y ~ x) then fits a linear regression model with
y as the dependent variable and
x as the independent variable.
We can now use the
predict() function to generate predicted values from the model:
# Generate predicted values predicted <- predict(model)
predict(model) applies the fitted model to the original data to generate predictions.
2. Plotting Predicted Values in Base R
In base R, you can use the
plot() function to create a scatter plot of the actual versus predicted values:
# Create a scatter plot plot(y, predicted, xlab = "Actual Values", ylab = "Predicted Values", main = "Actual vs. Predicted Values")
In this command,
plot(y, predicted) creates a scatter plot with
y (actual values) on the x-axis and
predicted (predicted values) on the y-axis. The
main arguments add labels to the x-axis, y-axis, and the plot itself, respectively.
To better visualize the accuracy of the predictions, we can add a line of perfect prediction (i.e., a line where the predicted value is always equal to the actual value):
# Add a line of perfect prediction abline(0, 1, col = "red")
abline() function adds a straight line to the plot. The arguments
1 specify the intercept and slope of the line, respectively. The
col argument sets the color of the line.
3. Plotting Predicted Values with ggplot2
The ggplot2 package provides a more flexible and visually appealing alternative for plotting predicted values. First, install and load the package:
# Install and load ggplot2 install.packages("ggplot2") library(ggplot2)
You can now create a scatter plot of the actual versus predicted values:
# Create a data frame df <- data.frame(Actual = y, Predicted = predicted) # Create a scatter plot ggplot(df, aes(x = Actual, y = Predicted)) + geom_point() + labs(x = "Actual Values", y = "Predicted Values", title = "Actual vs. Predicted Values") + theme_minimal() + geom_abline(intercept = 0, slope = 1, color = "red")
In this code:
data.frame(Actual = y, Predicted = predicted)creates a data frame with the actual and predicted values.
ggplot(df, aes(x = Actual, y = Predicted))initializes a ggplot object, specifying
Predictedas the x and y variables, respectively.
geom_point()adds a layer of points to the plot.
labs(x = "Actual Values", y = "Predicted Values", title = "Actual vs. Predicted Values")sets the labels for the x-axis, y-axis, and the plot.
theme_minimal()applies a minimal theme to the plot.
geom_abline(intercept = 0, slope = 1, color = "red")adds a red line of perfect prediction to the plot.
4. Visualizing Model Fit with Residual Plots
Another useful way to visualize the performance of a predictive model is by plotting the residuals – the differences between the actual and predicted values.
In base R, you can create a residual plot with the
# Calculate residuals residuals <- y - predicted # Create a residual plot plot(predicted, residuals, xlab = "Predicted Values", ylab = "Residuals", main = "Residual Plot") abline(h = 0, col = "red")
This code calculates the residuals (
y - predicted), then creates a scatter plot of the predicted values versus residuals. The line
abline(h = 0, col = "red") adds a horizontal line at
y = 0, indicating the point of perfect prediction.
You can also create a residual plot with ggplot2:
# Add residuals to the data frame df$Residuals <- residuals # Create a residual plot ggplot(df, aes(x = Predicted, y = Residuals)) + geom_point() + labs(x = "Predicted Values", y = "Residuals", title = "Residual Plot") + theme_minimal() + geom_hline(yintercept = 0, color = "red")
geom_hline(yintercept = 0, color = "red") adds a horizontal line at
y = 0.
By plotting predicted values and analyzing residual plots, you can get a clear visual understanding of your model’s performance and how well it fits the data. Understanding these visualizations is critical in model evaluation and selection, helping to ensure that your analyses are valid and reliable. Whether you use base R or packages like ggplot2, R provides robust and versatile tools for plotting and visualizing predicted values.