Pivot tables, a feature well-known to many Excel users, are a powerful tool for summarizing, analyzing, exploring, and presenting data in a tabular format. In R, similar functionality can be achieved using various packages and functions. This article provides an in-depth look into creating pivot tables in R.
Understanding Pivot Tables
A pivot table takes simple column-wise data as input, and groups the entries into a two-dimensional table that provides multi-dimensional analysis. A typical pivot table involves one or more categorical columns for grouping data and a numerical column for applying aggregating functions such as sum, count, or average.
Required Packages in R for Pivot Tables
Several packages in R allow us to create pivot tables. Some of the most commonly used ones include dplyr
, tidyverse
, reshape2
, and rpivotTable
.
Before we can use them, we have to install them using the install.packages()
function:
install.packages("dplyr")
install.packages("tidyverse")
install.packages("reshape2")
install.packages("rpivotTable")
After successful installation, we can load the packages:
library(dplyr)
library(tidyverse)
library(reshape2)
library(rpivotTable)
Preparing Data for Pivot Tables
For this article, we will use the built-in dataset mtcars
. This dataset was extracted from the 1974 Motor Trend US magazine, and comprises fuel consumption and 10 aspects of automobile design and performance for 32 automobiles.
You can view the data as follows:
head(mtcars)
Pivot Tables with dplyr and tidyr
The dplyr
package offers a set of tools for efficiently manipulating datasets in R. tidyr
provides a set of functions that help in tidying data. Let’s create a pivot table showing the average mpg
(Miles Per Gallon) for cars, grouped by the number of cylinders (cyl
) and gears (gear
).
mtcars %>%
group_by(cyl, gear) %>%
summarise(mean_mpg = mean(mpg, na.rm = TRUE)) %>%
spread(key = gear, value = mean_mpg)
Here, group_by(cyl, gear)
groups the data by the ‘cyl’ and ‘gear’ columns. summarise(mean_mpg = mean(mpg, na.rm = TRUE))
calculates the average mpg for each group, ignoring any NA values. spread(key = gear, value = mean_mpg)
reshapes the data so that the unique values in the ‘gear’ column become separate columns, and the values in these columns are the mean mpg values.
Pivot Tables with reshape2
The reshape2
package is another way to create pivot tables in R. Let’s replicate the same pivot table using reshape2
.
mtcars %>%
mutate(cyl = as.factor(cyl), gear = as.factor(gear)) %>%
dcast(cyl ~ gear, fun.aggregate = mean, value.var = "mpg")
In this code, mutate(cyl = as.factor(cyl), gear = as.factor(gear))
converts ‘cyl’ and ‘gear’ into factor variables. dcast(cyl ~ gear, fun.aggregate = mean, value.var = "mpg")
reshapes the data, performing the mean aggregation on ‘mpg’ for each combination of ‘cyl’ and ‘gear’.
Interactive Pivot Tables with rpivotTable
The rpivotTable
package allows us to create interactive pivot tables, similar to those found in spreadsheet programs like Excel. Here’s how we can create an interactive pivot table:
rpivotTable(mtcars, rows = "cyl", cols = "gear", vals = "mpg", aggregatorName = "Average")
In this code, rows = "cyl"
and cols = "gear"
define the rows and columns of the pivot table, vals = "mpg"
defines the values to be aggregated, and aggregatorName = "Average"
defines the aggregation function to use.
The result is an interactive pivot table that you can manipulate directly from your R console or RStudio viewer.
Conclusion
Pivot tables are powerful data summarization tools that can turn lengthy datasets into a comprehensible table. In R, we can achieve the same functionality using packages like dplyr
, tidyverse
, reshape2
, and rpivotTable
. By following the steps and examples outlined in this article, you should be able to transform your own data into pivot tables, helping you to perform more efficient data analysis.