Pairs plots, also known as scatterplot matrices, are incredibly useful tools for exploratory data analysis. They allow us to visualize pairwise relationships and distributions in a dataset, making it easier to spot trends, outliers, patterns, and correlations.
In this comprehensive guide, we will walk you through the process of creating pairs plots in R using two methods: using base R’s
pairs() function and using the
ggpairs() function from the GGally extension of the popular ggplot2 package.
1. Pairs Plots with base R
Base R comes with a simple
pairs() function that creates a matrix of scatter plots. Let’s use the built-in
mtcars dataset to demonstrate:
# Create a pairs plot pairs(mtcars)
Running this code produces a scatterplot matrix of every variable in the
mtcars dataset against every other variable.
While this plot is informative, it includes plots for some pairs that may not be meaningful, such as the car’s name (a non-numeric variable) against other variables. To focus on a subset of variables, you can select the columns of interest:
# Create a pairs plot with selected variables pairs(mtcars[, c("mpg", "disp", "hp", "wt")])
mtcars[, c("mpg", "disp", "hp", "wt")] selects only the miles per gallon (
mpg), displacement (
disp), horsepower (
hp), and weight (
wt) columns from the
2. Enhancing Pairs Plots with GGally
While the base R
pairs() function is straightforward, it lacks the flexibility and aesthetic appeal of the ggplot2 package. GGally is an extension of ggplot2 that includes the
ggpairs() function for creating enhanced pairs plots.
First, install and load GGally:
# Install and load GGally install.packages("GGally") library(GGally)
Creating a pairs plot with GGally is as simple as calling the
# Create a pairs plot with GGally ggpairs(mtcars[, c("mpg", "disp", "hp", "wt")])
In addition to scatter plots,
ggpairs() includes histograms along the diagonal to show the distribution of each variable, and correlation coefficients in the upper triangle to quantify the relationships.
3. Customizing Pairs Plots with GGally
ggpairs() function offers a range of customization options to enhance the visualization and make it easier to interpret.
For example, you can change the color scheme by adding a
# Create a pairs plot with a custom theme ggpairs(mtcars[, c("mpg", "disp", "hp", "wt")]) + theme_bw()
theme_bw() adds a theme with a white background and black grid lines.
You can also map a categorical variable to color to distinguish different groups in the scatter plots. Let’s add the
cyl variable, which represents the number of cylinders, as a grouping variable:
# Create a pairs plot with color mapping ggpairs(mtcars, columns = c("mpg", "disp", "hp", "wt"), mapping = aes(color = as.factor(cyl)))
columns = c("mpg", "disp", "hp", "wt") specifies the variables to include in the pairs plot, and
mapping = aes(color = as.factor(cyl)) maps the
cyl variable to color.
4. Interpreting Pairs Plots
Interpreting a pairs plot involves examining the scatter plots, histograms, and correlation coefficients to understand the pairwise relationships and distributions in your data.
- Scatter Plots: Each scatter plot represents the relationship between two variables. You can look for trends (e.g., positive or negative relationships), patterns (e.g., linear or non-linear relationships), and outliers.
- Histograms: Each histogram shows the distribution of a single variable. You can assess the shape (e.g., normal or skewed), center, and spread of the distribution.
- Correlation Coefficients: Each correlation coefficient quantifies the strength and direction of a linear relationship between two variables. The coefficient ranges from -1 to 1, with -1 indicating a perfect negative relationship, 1 indicating a perfect positive relationship, and 0 indicating no linear relationship.
Pairs plots are valuable tools for exploratory data analysis in R, offering a quick, comprehensive view of the relationships and distributions in a dataset. Whether you use the base R
pairs() function for simplicity or the
ggpairs() function from GGally for enhanced customization and aesthetics, understanding and creating pairs plots can greatly enhance your data analysis and visualization skills.