Regression analysis is a set of statistical processes for estimating the relationships among variables. It includes many techniques for modeling and analyzing several variables when the focus is on the relationship between a dependent variable and one or more independent variables. In this article, we will focus on the basics of creating a scatterplot with a regression line in R, using both base R and the popular ggplot2 package.
Introduction to Scatterplots and Regression Lines
A scatterplot is a type of plot or mathematical diagram using Cartesian coordinates to display values for typically two variables for a set of data. Each observation (or point) in the plot corresponds to one row in the data set. Scatterplots are used to visualize the relationship between two quantitative variables, and they are especially useful for interpreting trends in data.
The regression line, also known as the line of best fit, is a plot of the predicted values for the dependent variable (Y) as a function of the independent variable (X). When added to a scatterplot, the regression line helps us understand any linear relationship between the two variables. A steeper slope indicates a stronger relationship between the variables.
Using Built-in Data in R
In this guide, we will use the built-in mtcars
data set in R, which contains various car attributes for 32 models. You can take a look at the data using the head
function:
head(mtcars)
Creating Scatterplots and Regression Lines in Base R
Let’s create a scatterplot for the mtcars
data set, showing mpg
(miles per gallon) as a function of hp
(horsepower), and then add a regression line.
Scatterplot
To create a scatterplot in base R, we use the plot
function:
plot(mtcars$mpg ~ mtcars$hp, main = "Scatterplot of MPG vs HP", xlab = "Horsepower", ylab = "Miles Per Gallon")

Here, xlab
and ylab
are used to provide labels for the x-axis and y-axis, respectively, while main
is used to provide a title for the plot.
Regression Line
Next, we create a linear model using the lm
function and add a regression line to the scatterplot using the abline
function:
model <- lm(mtcars$mpg ~ mtcars$hp)
plot(mtcars$mpg ~ mtcars$hp, main = "Scatterplot of MPG vs HP", xlab = "Horsepower", ylab = "Miles Per Gallon")
abline(model, col = "red")

In the lm
function, the tilde character (~
) signifies “as a function of”. The abline
function then adds a line to the plot based on the coefficients in the linear model. The line is colored red for visibility.
Creating Scatterplots and Regression Lines with ggplot2
While base R is sufficient for creating scatterplots and regression lines, the ggplot2
package allows for more flexibility and customization. To use ggplot2
, you first need to install and load it into your R environment:
install.packages("ggplot2")
library(ggplot2)
Scatterplot
The syntax of ggplot2
involves initializing a ggplot object and adding layers to it. To create a scatterplot:
ggplot(mtcars, aes(x = hp, y = mpg)) +
geom_point() +
labs(x = "Horsepower", y = "Miles Per Gallon", title = "Scatterplot of MPG vs HP")

Here, aes
is used to assign the hp
variable to the x-axis and mpg
to the y-axis. geom_point
then creates the scatterplot. labs
is used to provide labels for the x-axis, y-axis, and the plot title.
Regression Line
To add a regression line, we use the geom_smooth
function with the method argument set to “lm”:
ggplot(mtcars, aes(x = hp, y = mpg)) +
geom_point() +
geom_smooth(method = lm, se = FALSE, color = "red") +
labs(x = "Horsepower", y = "Miles Per Gallon", title = "Scatterplot of MPG vs HP with Regression Line")

In geom_smooth
, method = lm
indicates that a linear model should be used, se = FALSE
removes the shaded confidence interval around the line, and color = "red"
makes the line red for visibility.
Interpreting the Plot
With the scatterplot and regression line complete, you can start interpreting the plot. The scatterplot gives you a general idea of the relationship between the variables. The regression line can provide insights into the nature of the relationship. A steeper slope indicates a stronger relationship between the variables. However, keep in mind that correlation does not imply causation – additional analysis would be necessary to determine any causal relationships.
Conclusion
Scatterplots with regression lines are powerful tools for exploring the relationship between two quantitative variables. Both base R and ggplot2
offer robust functionalities for creating these plots, with ggplot2
offering additional flexibility and customization options.