Scatter plots are among the most popular and useful ways to visualize the relationship between two numerical variables. However, data can sometimes be too tightly packed or overlapped in the scatterplot, making it difficult to see the distribution of points. This is especially a problem when dealing with discrete data, where many data points can have exactly the same values.
To solve this problem, we can use a technique known as “jittering.” Jittering involves adding a small amount of random noise to the data to avoid overplotting. In R, we can use the jitter()
function to achieve this.
In this article, we will learn about using the jitter()
function in R to create jittered scatterplots, first using base R and then using the ggplot2 package.
The Jitter Function in R
The jitter()
function in R adds a small amount of random variation to the data. The function has two main arguments:
x
: A numeric vector that you want to jitter.factor
: The amount of jittering. The default value is 1. This factor is multiplied by the amount of jittering calculated by the function.
Let’s first create a scatterplot without jittering. We’ll use the mtcars dataset, which is included with R.
data(mtcars)
plot(mtcars$mpg, mtcars$cyl, main = "Scatterplot without Jittering", xlab = "Miles Per Gallon", ylab = "Number of Cylinders")

Here, we can see that many of the points overlap, which makes it difficult to see the distribution of points.Now let’s apply the jitter()
function to the data:
plot(jitter(mtcars$mpg), jitter(mtcars$cyl), main = "Scatterplot with Jittering", xlab = "Miles Per Gallon", ylab = "Number of Cylinders")

After jittering, the points are spread out a bit, which makes it easier to see the distribution of points.
Customizing the Amount of Jittering
You can control the amount of jittering by setting the factor
argument. A larger factor will result in more jittering, and a smaller factor will result in less jittering.
Here’s how to create a scatterplot with a large amount of jittering:
plot(jitter(mtcars$mpg, factor = 2), jitter(mtcars$cyl, factor = 2), main = "Scatterplot with Large Jittering", xlab = "Miles Per Gallon", ylab = "Number of Cylinders")

And here’s how to create a scatterplot with a small amount of jittering:
plot(jitter(mtcars$mpg, factor = 0.5), jitter(mtcars$cyl, factor = 0.5), main = "Scatterplot with Small Jittering", xlab = "Miles Per Gallon", ylab = "Number of Cylinders")

Using the Jitter Function with ggplot2
The ggplot2 package provides a more flexible and powerful way to create scatterplots. You can add jittering to a scatterplot in ggplot2 by using the geom_jitter()
function.
Here’s how to create a jittered scatterplot with ggplot2:
library(ggplot2)
ggplot(mtcars, aes(x = mpg, y = cyl)) +
geom_jitter() +
labs(title = "Scatterplot with Jittering", x = "Miles Per Gallon", y = "Number of Cylinders")

Customizing the Amount of Jittering in ggplot2
You can control the amount of jittering in ggplot2 by setting the width
and height
arguments in the geom_jitter()
function. The width
argument controls the amount of horizontal jittering, and the height
argument controls the amount of vertical jittering.
Here’s how to create a scatterplot with a large amount of jittering:
ggplot(mtcars, aes(x = mpg, y = cyl)) +
geom_jitter(width = 0.5, height = 0.5) +
labs(title = "Scatterplot with Large Jittering", x = "Miles Per Gallon", y = "Number of Cylinders")

And here’s how to create a scatterplot with a small amount of jittering:
ggplot(mtcars, aes(x = mpg, y = cyl)) +
geom_jitter(width = 0.1, height = 0.1) +
labs(title = "Scatterplot with Small Jittering", x = "Miles Per Gallon", y = "Number of Cylinders")

Combining Jittering with Other Geoms in ggplot2
One of the advantages of ggplot2 is that you can easily combine multiple types of geoms in the same plot. For example, you can combine jittering with a smooth line to show the trend in the data:
ggplot(mtcars, aes(x = mpg, y = cyl)) +
geom_jitter() +
geom_smooth(method = "lm", se = FALSE, col = "red") +
labs(title = "Scatterplot with Jittering and Trend Line", x = "Miles Per Gallon", y = "Number of Cylinders")

In this case, geom_smooth(method = "lm", se = FALSE)
adds a linear regression line (i.e., trend line) to the plot, and col = "red"
sets the color of the line to red.
Conclusion
Jittering is a useful technique for avoiding overplotting in scatterplots. The jitter()
function in R provides an easy way to add jittering to a scatterplot. The ggplot2 package also provides the geom_jitter()
function, which offers more flexibility and customization options.
Remember, though, that while jittering can help visualize the distribution of points, it also distorts the data. Therefore, you should always clearly indicate when you have jittered the data, and you should avoid jittering when the exact values are important.
Finally, it’s always a good idea to experiment with different amounts of jittering to see what works best for your data.