# How to Plot Predicted Values in R

Predictive modeling is an integral part of data analysis and machine learning. Once a predictive model has been trained, the model can be used to predict outcomes based on new or existing data. These predicted values can then be plotted against the actual values, providing a visual representation of the model’s performance and the relationship between the observed and predicted outcomes.

This extensive guide will walk you through various ways of plotting predicted values in R. We will cover methods using both base R and popular packages like ggplot2, and we will also introduce the concept of regression models and demonstrate how to visualize their predictions.

## 1. Generating Predicted Values

Before we can plot predicted values, we need a model to generate these predictions. For this article, we’ll use a simple linear regression model as an example. Let’s generate some random data and fit a linear regression model:

# Set seed for reproducibility
set.seed(123)

# Generate random data
x <- rnorm(100)
y <- 2*x + rnorm(100)

# Fit a linear regression model
model <- lm(y ~ x)

In this example, rnorm(100) generates 100 random numbers from a standard normal distribution. We generate y as a function of x with some added noise. lm(y ~ x) then fits a linear regression model with y as the dependent variable and x as the independent variable.

We can now use the predict() function to generate predicted values from the model:

# Generate predicted values
predicted <- predict(model)

predict(model) applies the fitted model to the original data to generate predictions.

## 2. Plotting Predicted Values in Base R

In base R, you can use the plot() function to create a scatter plot of the actual versus predicted values:

# Create a scatter plot
plot(y, predicted, xlab = "Actual Values", ylab = "Predicted Values", main = "Actual vs. Predicted Values")

In this command, plot(y, predicted) creates a scatter plot with y (actual values) on the x-axis and predicted (predicted values) on the y-axis. The xlab, ylab, and main arguments add labels to the x-axis, y-axis, and the plot itself, respectively.

To better visualize the accuracy of the predictions, we can add a line of perfect prediction (i.e., a line where the predicted value is always equal to the actual value):

# Add a line of perfect prediction
abline(0, 1, col = "red")

The abline() function adds a straight line to the plot. The arguments 0 and 1 specify the intercept and slope of the line, respectively. The col argument sets the color of the line.

## 3. Plotting Predicted Values with ggplot2

The ggplot2 package provides a more flexible and visually appealing alternative for plotting predicted values. First, install and load the package:

# Install and load ggplot2
install.packages("ggplot2")
library(ggplot2)

You can now create a scatter plot of the actual versus predicted values:

# Create a data frame
df <- data.frame(Actual = y, Predicted = predicted)

# Create a scatter plot
ggplot(df, aes(x = Actual, y = Predicted)) +
geom_point() +
labs(x = "Actual Values", y = "Predicted Values", title = "Actual vs. Predicted Values") +
theme_minimal() +
geom_abline(intercept = 0, slope = 1, color = "red")

In this code:

1. data.frame(Actual = y, Predicted = predicted) creates a data frame with the actual and predicted values.
2. ggplot(df, aes(x = Actual, y = Predicted)) initializes a ggplot object, specifying Actual and Predicted as the x and y variables, respectively.
3. geom_point() adds a layer of points to the plot.
4. labs(x = "Actual Values", y = "Predicted Values", title = "Actual vs. Predicted Values") sets the labels for the x-axis, y-axis, and the plot.
5. theme_minimal() applies a minimal theme to the plot.
6. geom_abline(intercept = 0, slope = 1, color = "red") adds a red line of perfect prediction to the plot.

## 4. Visualizing Model Fit with Residual Plots

Another useful way to visualize the performance of a predictive model is by plotting the residuals – the differences between the actual and predicted values.

In base R, you can create a residual plot with the plot() and abline() functions:

# Calculate residuals
residuals <- y - predicted

# Create a residual plot
plot(predicted, residuals, xlab = "Predicted Values", ylab = "Residuals", main = "Residual Plot")
abline(h = 0, col = "red")

This code calculates the residuals (y - predicted), then creates a scatter plot of the predicted values versus residuals. The line abline(h = 0, col = "red") adds a horizontal line at y = 0, indicating the point of perfect prediction.

You can also create a residual plot with ggplot2:

# Add residuals to the data frame
df\$Residuals <- residuals

# Create a residual plot
ggplot(df, aes(x = Predicted, y = Residuals)) +
geom_point() +
labs(x = "Predicted Values", y = "Residuals", title = "Residual Plot") +
theme_minimal() +
geom_hline(yintercept = 0, color = "red")

Here, geom_hline(yintercept = 0, color = "red") adds a horizontal line at y = 0.

By plotting predicted values and analyzing residual plots, you can get a clear visual understanding of your model’s performance and how well it fits the data. Understanding these visualizations is critical in model evaluation and selection, helping to ensure that your analyses are valid and reliable. Whether you use base R or packages like ggplot2, R provides robust and versatile tools for plotting and visualizing predicted values.

Posted in RTagged