# How to Create a Stacked Bar Plot in R

A stacked bar plot provides a way to compare different categories while also showing the composition of each category. It is often used in exploratory data analysis.

This article will guide you through the process of creating a stacked bar plot in R. We’ll first discuss the basics of a stacked bar plot and the data required, then move on to creating the plot in R. Finally, we’ll cover some advanced customization options to make your plot stand out. Let’s get started.

## I. Introduction to Stacked Bar Plots

Stacked bar plots are a type of bar plot where the segments of each bar are stacked on top of each other. Each bar represents a category, and the segments represent different variables. The length (or height, depending on orientation) of each segment corresponds to the value of that variable for the given category.

Stacked bar plots are especially useful when you want to show the total size of groups, and how groups are divided into sub-groups. They provide a visual representation of data that can be much more intuitive than looking at raw numbers.

## II. Understanding the Data for Stacked Bar Plots

To create a stacked bar plot, you will need a categorical variable and one or more numeric variables. The categorical variable will determine the categories (i.e., the bars), and the numeric variables will determine the segments within each bar.

For instance, consider a dataset of a company’s sales data with ‘Region’ as a categorical variable (e.g., North, South, East, West), and ‘Product Category’ (e.g., Electronics, Apparel, Home Goods) as numeric variables representing the sales revenue in each category. Here, a stacked bar plot can provide a visual summary of sales by region and product category.

## III. Creating a Basic Stacked Bar Plot in R

First, let’s load the necessary packages and set up some data. We will use the ggplot2 package, which provides a powerful system for creating graphics in R. If you haven’t already installed it, you can do so with install.packages("ggplot2").

Here’s a simple example:

# Load necessary package
library(ggplot2)

# Set up data
sales_data <- data.frame(
Region = rep(c("North", "South", "East", "West"), each = 3),
Product = rep(c("Electronics", "Apparel", "Home Goods"), times = 4),
Revenue = c(10000, 5000, 8000, 7000, 4000, 9000, 6000, 8000, 7000, 5000, 3000, 9000)
)

# View the data
print(sales_data)

To create a stacked bar plot of this data, we can use the ggplot() function along with geom_bar(). The aes() function is used to specify the aesthetics of the plot – here, x = Region, y = Revenue, and fill = Product.

# Create stacked bar plot
ggplot(sales_data, aes(x = Region, y = Revenue, fill = Product)) +
geom_bar(stat = "identity")

In the geom_bar() function, the stat argument is set to "identity" because the y-values are already calculated.

## IV. Customizing the Stacked Bar Plot

R and ggplot2 offer numerous ways to customize your plot, from changing the colors to adding labels or a legend.

### 1. Changing Colors

By default, ggplot2 will use a default color scheme. You can change the colors of the bars using the scale_fill_manual() function:

# Create stacked bar plot with custom colors
ggplot(sales_data, aes(x = Region, y = Revenue, fill = Product)) +
geom_bar(stat = "identity") +
scale_fill_manual(values = c("Electronics" = "blue", "Apparel" = "red", "Home Goods" = "green"))

You can add labels to your plot using the labs() function:

# Create stacked bar plot with labels
ggplot(sales_data, aes(x = Region, y = Revenue, fill = Product)) +
geom_bar(stat = "identity") +
labs(title = "Sales by Region and Product Category", x = "Region", y = "Revenue", fill = "Product")

The theme() function and its arguments can be used to adjust the legend:

# Create stacked bar plot with adjusted legend
ggplot(sales_data, aes(x = Region, y = Revenue, fill = Product)) +
geom_bar(stat = "identity") +
theme(legend.position = "bottom", legend.title = element_blank())

To add data labels on top of each segment, use geom_text():
# Create stacked bar plot with data labels
geom_text(size = 3, position = position_stack(vjust = 0.5))