One of the most common and useful types of graphics in statistical analysis is the scatter plot. In this article, we’ll discuss how to create a scatter plot in R using both base R and ggplot2, one of the most popular packages in R for data visualization.

## What is a Scatter Plot?

A scatter plot is a type of plot using Cartesian coordinates to display values from two variables. By displaying a variable in each axis, you can detect if a relationship or correlation exists between the two variables. Essentially, scatter plots show how much one variable is affected by another, thus establishing a relationship between them.

Scatter plots are particularly useful for investigating the strength and direction of relationships between variables, detecting outliers, and identifying trends.

## Creating a Simple Scatter Plot in R

The most basic way to create a scatter plot in R is by using the `plot()`

function. Let’s use the built-in `mtcars`

dataset in R, which contains various car attributes, to create a scatter plot showing the relationship between horsepower (hp) and miles per gallon (mpg).

```
# Load the data
data(mtcars)
# Create a scatter plot
plot(mtcars$hp, mtcars$mpg,
main = "Horsepower vs. Miles per Gallon",
xlab = "Horsepower",
ylab = "Miles per Gallon")
```

The `plot()`

function in this case takes two arguments: the x and y variables. You can add a title to the plot and labels to the axes using the `main`

, `xlab`

, and `ylab`

parameters, respectively.

## Changing the Color and Shape of Points

You can customize your scatter plot in various ways. For instance, you can change the color and shape of points in your scatter plot. The `col`

parameter allows you to change the color of the points, while the `pch`

parameter can be used to change the shape.

```
plot(mtcars$hp, mtcars$mpg,
main = "Horsepower vs. Miles per Gallon",
xlab = "Horsepower",
ylab = "Miles per Gallon",
col = "blue",
pch = 19)
```

The value of `pch`

can be anything from 0 to 25, and each value represents a different symbol.

## Adding a Regression Line

To visualize the relationship between the two variables more clearly, you can add a regression line (or a line of best fit) to your scatter plot. You can do this using the `abline()`

function in conjunction with the `lm()`

function, which fits a linear model to the data.

```
# Create a scatter plot
plot(mtcars$hp, mtcars$mpg,
main = "Horsepower vs. Miles per Gallon",
xlab = "Horsepower",
ylab = "Miles per Gallon",
col = "blue",
pch = 19)
# Add a regression line
abline(lm(mtcars$mpg ~ mtcars$hp), col = "red")
```

The `lm()`

function here is used to fit a linear model to the data, and the resulting model is passed to the `abline()`

function, which adds a straight line to the plot.

## Creating Scatter Plots with ggplot2

First, you need to install and load the ggplot2 package:

```
# Install ggplot2
install.packages("ggplot2")
# Load ggplot2
library(ggplot2)
```

Once you have ggplot2 installed and loaded, you can create a scatter plot using the `geom_point()`

function.

```
ggplot(mtcars, aes(x = hp, y = mpg)) +
geom_point() +
labs(title = "Horsepower vs. Miles per Gallon",
x = "Horsepower",
y = "Miles per Gallon")
```

In the `ggplot()`

function, the first argument specifies the dataset, and the `aes()`

function defines the aesthetics of the plot, such as the x and y variables. The `geom_point()`

function adds the layer of points to create the scatter plot.

## Customizing Scatter Plots in ggplot2

You can also customize the color, shape, and size of points in ggplot2:

```
ggplot(mtcars, aes(x = hp, y = mpg)) +
geom_point(color = "blue",
shape = 19,
size = 3) +
labs(title = "Horsepower vs. Miles per Gallon",
x = "Horsepower",
y = "Miles per Gallon")
```

## Adding a Regression Line in ggplot2

Just like in base R, you can add a regression line to your scatter plot in ggplot2. You can do this using the `geom_smooth()`

function, which adds a smoothed conditional mean:

```
ggplot(mtcars, aes(x = hp, y = mpg)) +
geom_point(color = "blue",
shape = 19,
size = 3) +
geom_smooth(method = lm,
se = FALSE,
color = "red") +
labs(title = "Horsepower vs. Miles per Gallon",
x = "Horsepower",
y = "Miles per Gallon")
```

Here, the `method = lm`

argument specifies that a linear model should be fitted to the data, and `se = FALSE`

means that the standard error should not be plotted.

## Conclusion

Scatter plots are essential tools in statistical and data analysis, as they can illustrate the relationship between two variables. This article has shown you how to create scatter plots in R using both base R and ggplot2. With these tools in your data analysis toolkit, you can start exploring the relationships within your own datasets.