# How to Create a Population Pyramid in R

A population pyramid, also known as an age-sex pyramid, is a graphical representation that illustrates the distribution of various age groups in a population. It typically takes the form of a back-to-back histogram and provides a clear view of the population’s age and gender composition. In this comprehensive guide, we’ll walk you through the process of creating a population pyramid in R using different libraries, including base R, and ggplot2 packages.

## 1. Understanding Population Pyramids

A population pyramid provides a snapshot of the age and sex structure of a population. It’s divided into two back-to-back histograms, which represent the male and female populations. Each histogram is further divided into age groups, with the youngest age groups at the bottom and the oldest at the top.

## 2. Preliminaries

Before creating the population pyramid, we first need to install and load the necessary libraries, and also prepare the dataset.


library(ggplot2)

# Sample data
population_data <- data.frame(
AgeGroup = c("0-4", "5-9", "10-14", "15-19", "20-24", "25-29", "30-34",
"35-39", "40-44", "45-49", "50-54", "55-59", "60-64", "65-69",
"70-74", "75-79", "80-84", "85+"),
Males = c(1200, 1400, 1600, 1700, 1800, 1900, 2000, 2100, 2000, 1800,
1700, 1600, 1500, 1400, 1200, 1100, 900, 700),
Females = c(1100, 1300, 1500, 1550, 1600, 1700, 1800, 1900, 1850, 1750,
1650, 1550, 1450, 1350, 1250, 1150, 1000, 850)
)

In this example, we are using a made-up dataset that contains the number of males and females in different age groups.

## 3. Creating a Basic Population Pyramid with Base R

The simplest way to create a population pyramid with base R is by using the barplot() function. However, since barplot() doesn’t support negative values, we need to multiply the male population by -1 to display it on the left side of the pyramid.

# Multiply male population by -1
population_data$Males <- -population_data$Males

# Create a basic population pyramid
barplot(height = c(population_data$Males, population_data$Females),
names.arg = rep(population_data$AgeGroup, 2), beside = TRUE, horiz = TRUE, col = c("blue", "pink"), main = "Population Pyramid with Base R", xlab = "Population", ylab = "Age Group", xlim = range(population_data$Males, population_data\$Females))

legend("topright", legend = c("Males", "Females"), fill = c("blue", "pink"))

In this example, height specifies the bar heights, names.arg specifies the labels for each bar, beside = TRUE places the bars beside each other, horiz = TRUE creates horizontal bars, and col specifies the colors.

## 4. Creating a Population Pyramid using ggplot2

The ggplot2 package provides more control over the aesthetics of the pyramid. The geom_bar() function is used to create the bars, and coord_flip() is used to make the bars horizontal.

# Create a population pyramid with ggplot2
population_data_long <- tidyr::pivot_longer(population_data, c("Males", "Females"),
names_to = "Sex", values_to = "Population")

ggplot(population_data_long, aes(x = AgeGroup, y = Population, fill = Sex)) +
geom_bar(stat = "identity", position = "identity") +
coord_flip() +
scale_y_continuous(labels = abs) +
labs(title = "Population Pyramid with Ggplot2",
x = "Age Group",
y = "Population") +
theme_minimal() +
scale_fill_manual(values = c("blue", "pink"))

In this example, we first reshape the data from wide to long format using the pivot_longer() function from the tidyr package. Then, aes() maps the aesthetics (x, y, and fill), geom_bar(stat = "identity") creates the bars, coord_flip() makes the bars horizontal, scale_y_continuous(labels = abs) makes the y-axis labels positive, and scale_fill_manual() specifies the colors.

## 6. Conclusion

Creating a population pyramid in R can be achieved in various ways depending on your specific needs and the complexity of your data. While base R offers a straightforward solution for creating a basic pyramid, the ggplot2 provide more advanced and aesthetically pleasing options.

Posted in RTagged