Scatter plots are one of the most effective ways to visualize the relationship between two numeric variables. They allow us to observe trends, patterns, and potential outliers. In many instances, we also have categorical variables that divide data into groups. Being able to create scatter plots by group adds an additional dimension to our analysis and can provide valuable insights.

This article will guide you through the process of creating scatter plots by group in R, using both base R functions and the popular data visualization package `ggplot2`

.

## Understanding Scatter Plots

A scatter plot is a diagram where each value in the data set is represented by a dot. The position of a dot on the x and y axis indicates values for an individual data point. Scatter plots can show a variety of information, including:

- How much one variable is affected by another.
- The direction of the relationship between variables.
- The strength of the relationship between variables.
- Outlier points.

## Using Built-in Data in R

For the sake of simplicity, this guide will utilize the built-in `mtcars`

data set in R. This data set comprises fuel consumption data (mpg – miles per gallon) and ten aspects of automobile design and performance for 32 automobiles.

You can take a peek at the data using the `head`

function:

`head(mtcars)`

## Creating Scatter Plots in Base R

Let’s start by making a simple scatter plot in base R, without grouping. Suppose we want to plot `mpg`

(miles per gallon) against `hp`

(horsepower). We would use the `plot`

function as follows:

`plot(mtcars$mpg ~ mtcars$hp, xlab = "Horsepower", ylab = "Miles Per Gallon", main = "Scatterplot of MPG vs HP")`

Here, `xlab`

, `ylab`

, and `main`

are used to provide labels for the x-axis, y-axis, and the plot title, respectively.

## Grouping Scatter Plots in Base R

Now, let’s suppose we want to distinguish between cars with automatic and manual transmissions (represented by the `am`

variable in the data). Here, we can utilize the `ifelse`

statement in R, which takes the following form: `ifelse(test, yes, no)`

. If `test`

is `TRUE`

, `yes`

is returned; if `test`

is `FALSE`

, `no`

is returned.

```
colors <- ifelse(mtcars$am == 0, "red", "blue")
plot(mtcars$mpg ~ mtcars$hp, col = colors, pch = 19, xlab = "Horsepower", ylab = "Miles Per Gallon", main = "Scatterplot of MPG vs HP by Transmission")
legend("topright", legend = c("Automatic", "Manual"), col = c("red", "blue"), pch = 19)
```

In this code, we’ve assigned a different color to each group, with “red” for automatic cars and “blue” for manual cars. The `legend`

function is used to add a legend to the plot.

## Creating Scatter Plots with ggplot2

While base R offers a decent amount of plotting capabilities, `ggplot2`

is a widely-used package that provides advanced and aesthetically pleasing graphics.

First, ensure that the `ggplot2`

package is installed and loaded into your workspace:

```
install.packages("ggplot2")
library(ggplot2)
```

The syntax for `ggplot2`

can be somewhat complex, but it’s incredibly flexible once you get the hang of it. Here’s how you can create the same scatter plot as above, but with `ggplot2`

:

```
ggplot(mtcars, aes(x = hp, y = mpg, color = factor(am))) +
geom_point() +
labs(x = "Horsepower", y = "Miles Per Gallon", color = "Transmission") +
ggtitle("Scatterplot of MPG vs HP by Transmission") +
scale_color_manual(values = c("red", "blue"), labels = c("Automatic", "Manual"))
```

The `aes`

function is used to map variables to visual properties (aesthetics) of the graph. Here, we’ve mapped `hp`

to the x-axis, `mpg`

to the y-axis, and `am`

to the color of the points. The `factor`

function is used to treat `am`

as a categorical variable.

`geom_point`

is the layer that actually creates the scatter plot. `labs`

is used for labels, and `ggtitle`

for the title. `scale_color_manual`

is used to manually specify the colors and labels for the different groups.

## Conclusion

Scatter plots are powerful tools for visualizing the relationship between two numeric variables. Creating scatter plots by group in R, whether using base R or the `ggplot2`

package, allows you to add another dimension to your plots, potentially revealing more complex patterns and insights in your data.