A Pareto Chart is a type of chart that contains both bars and a line graph, where individual values are represented in descending order by bars, and the cumulative total is represented by the line. Pareto Charts are based on the Pareto Principle, also known as the 80/20 rule, which states that 80% of the results come from 20% of the causes. In the context of a Pareto chart, it means that 80% of the effect comes from 20% of the causes. This type of chart is often used in quality control to identify the most critical issues or causes.
R, being a versatile language for statistical analysis, provides numerous ways to create a Pareto chart. In this article, we will explore two approaches: creating a Pareto chart using base R functions, and using the
Creating a Pareto Chart Using Base R
In this section, we will create a Pareto chart using only the base R functions. Here are the steps involved:
- Creating a Dataset: First, we create a simple dataset.
# Create a dataset set.seed(123) category <- LETTERS[1:10] frequency <- sample(100:200, 10) df <- data.frame(category, frequency)
In this dataset,
category represents different causes or issues, and
frequency represents the number of occurrences of each cause.
- Sorting and Cumulative Frequency Calculation: The next step is to sort the data in descending order of frequency and calculate the cumulative frequency.
# Sort the data and calculate the cumulative frequency df <- df[order(-df$frequency),] df$cumulative_frequency <- cumsum(df$frequency)
3. Calculating the Cumulative Percentage: Now, we need to calculate the cumulative percentage of the frequencies.
# Calculate the cumulative percentage df$cumulative_percentage <- df$cumulative_frequency / sum(df$frequency) * 100
4. Creating the Pareto Chart: Finally, we create the Pareto chart using the
# Create the Pareto chart barplot(df$frequency, names.arg = df$category, las=2, col="skyblue", main="Pareto Chart", xlab="Category", ylab="Frequency") par(new=TRUE) plot(df$cumulative_percentage, type="o", col="red", axes=FALSE, ann=FALSE) axis(side=4) mtext(side=4, line=3, 'Cumulative Percentage')
In this code,
barplot() creates a bar plot of frequencies,
par(new=TRUE) allows us to add another plot on the current plot,
plot() creates a line plot of the cumulative percentage, and
mtext() add a secondary y-axis for the cumulative percentage.
Creating a Pareto Chart Using ggplot2
While the base R functions provide a straightforward way to create a Pareto chart, the
ggplot2 package offers a more flexible and powerful way to create and customize the chart. Here are the steps involved:
1. Installing and Loading the
ggplot2 Package: The first step is to install and load the
# Install install.packages("ggplot2") # Load library(ggplot2)
2. Creating a Dataset, Sorting and Cumulative Frequency Calculation: The steps are similar to the base R approach.
# Create a dataset set.seed(123) category <- LETTERS[1:10] frequency <- sample(100:200, 10) df <- data.frame(category, frequency) # Sort the data and calculate the cumulative frequency df <- df[order(-df$frequency),] df$cumulative_frequency <- cumsum(df$frequency) # Calculate the cumulative percentage df$cumulative_percentage <- df$cumulative_frequency / sum(df$frequency) * 100
3. Creating the Pareto Chart: Finally, we create the Pareto chart using the
geom_bar() functions for the bar plot, and
geom_point() for the line plot.
# Create the Pareto chart ggplot(df, aes(x = category)) + geom_bar(aes(y = frequency), stat="identity", fill="skyblue") + geom_line(aes(y = cumulative_percentage), group=1, colour="red") + geom_point(aes(y = cumulative_percentage), group=1, colour="red") + scale_y_continuous(sec.axis = sec_axis(~./max(df$frequency)*100, name = "Cumulative Percentage")) + labs(title="Pareto Chart", x="Category", y="Frequency") + theme_minimal()
In this code,
geom_bar() creates the bar plot,
geom_point() create the line plot,
scale_y_continuous() adds a secondary y-axis for the cumulative percentage,
labs() adds the title and axis labels, and
theme_minimal() sets the theme of the plot.
A Pareto chart is a helpful tool in quality control and business decision-making, allowing us to focus on the most critical issues. This article demonstrated two methods of creating a Pareto chart in R, one using base R functions and the other using the
ggplot2 package. Each method has its advantages: the base R approach is straightforward and requires no additional packages, while the
ggplot2 approach provides more control over the appearance of the chart. Choose the method that suits your needs best. Remember, the essence of the Pareto chart is its principle: to prioritize the few significant over the many insignificant.