A population pyramid, also known as an age-sex pyramid, is a graphical representation that illustrates the distribution of various age groups in a population. It typically takes the form of a back-to-back histogram and provides a clear view of the population’s age and gender composition. In this comprehensive guide, we’ll walk you through the process of creating a population pyramid in R using different libraries, including base R
, and ggplot2
packages.
1. Understanding Population Pyramids
A population pyramid provides a snapshot of the age and sex structure of a population. It’s divided into two back-to-back histograms, which represent the male and female populations. Each histogram is further divided into age groups, with the youngest age groups at the bottom and the oldest at the top.
2. Preliminaries
Before creating the population pyramid, we first need to install and load the necessary libraries, and also prepare the dataset.
# Load necessary libraries
library(ggplot2)
# Sample data
population_data <- data.frame(
AgeGroup = c("0-4", "5-9", "10-14", "15-19", "20-24", "25-29", "30-34",
"35-39", "40-44", "45-49", "50-54", "55-59", "60-64", "65-69",
"70-74", "75-79", "80-84", "85+"),
Males = c(1200, 1400, 1600, 1700, 1800, 1900, 2000, 2100, 2000, 1800,
1700, 1600, 1500, 1400, 1200, 1100, 900, 700),
Females = c(1100, 1300, 1500, 1550, 1600, 1700, 1800, 1900, 1850, 1750,
1650, 1550, 1450, 1350, 1250, 1150, 1000, 850)
)
In this example, we are using a made-up dataset that contains the number of males and females in different age groups.
3. Creating a Basic Population Pyramid with Base R
The simplest way to create a population pyramid with base R is by using the barplot()
function. However, since barplot()
doesn’t support negative values, we need to multiply the male population by -1 to display it on the left side of the pyramid.
# Multiply male population by -1
population_data$Males <- -population_data$Males
# Create a basic population pyramid
barplot(height = c(population_data$Males, population_data$Females),
names.arg = rep(population_data$AgeGroup, 2),
beside = TRUE, horiz = TRUE,
col = c("blue", "pink"),
main = "Population Pyramid with Base R",
xlab = "Population",
ylab = "Age Group",
xlim = range(population_data$Males, population_data$Females))
# Add a legend
legend("topright", legend = c("Males", "Females"), fill = c("blue", "pink"))

In this example, height
specifies the bar heights, names.arg
specifies the labels for each bar, beside = TRUE
places the bars beside each other, horiz = TRUE
creates horizontal bars, and col
specifies the colors.
4. Creating a Population Pyramid using ggplot2
The ggplot2
package provides more control over the aesthetics of the pyramid. The geom_bar()
function is used to create the bars, and coord_flip()
is used to make the bars horizontal.
# Create a population pyramid with ggplot2
population_data_long <- tidyr::pivot_longer(population_data, c("Males", "Females"),
names_to = "Sex", values_to = "Population")
ggplot(population_data_long, aes(x = AgeGroup, y = Population, fill = Sex)) +
geom_bar(stat = "identity", position = "identity") +
coord_flip() +
scale_y_continuous(labels = abs) +
labs(title = "Population Pyramid with Ggplot2",
x = "Age Group",
y = "Population") +
theme_minimal() +
scale_fill_manual(values = c("blue", "pink"))

In this example, we first reshape the data from wide to long format using the pivot_longer()
function from the tidyr
package. Then, aes()
maps the aesthetics (x, y, and fill), geom_bar(stat = "identity")
creates the bars, coord_flip()
makes the bars horizontal, scale_y_continuous(labels = abs)
makes the y-axis labels positive, and scale_fill_manual()
specifies the colors.
6. Conclusion
Creating a population pyramid in R can be achieved in various ways depending on your specific needs and the complexity of your data. While base R offers a straightforward solution for creating a basic pyramid, the ggplot2
provide more advanced and aesthetically pleasing options.