# How to Create a Frequency Polygon in R

Frequency polygons are a graphical device designed for the purpose of understanding the shapes of distributions. They serve the same purpose as histograms, but they are especially helpful for comparing sets of data. Frequency polygons are also a good choice for displaying cumulative frequency distributions.

In this article, we will focus on how to create frequency polygons in R, both using base R commands and the ‘ggplot2’ package, a powerful and flexible package for creating plots in R.

## Creating Frequency Polygons Using Base R

In base R, you can use the plot() and lines() functions to create frequency polygons. Let’s first look at how to create a simple frequency polygon using base R.

# Create a normal distribution with 1000 data points
set.seed(123)
data <- rnorm(1000)

# Create a histogram and save the output
hist_info <- hist(data, plot = FALSE)

# Create a frequency polygon
plot(hist_info$mids, hist_info$counts, type = "l", main = "Frequency Polygon", xlab = "Values", ylab = "Frequency")

In this case, we first create a normal distribution of 1000 data points using rnorm(). We then use the hist() function to create a histogram of the data. We set plot = FALSE so that R does not display the histogram. The hist() function returns a list that includes the midpoints (mids) and counts (counts) of the histogram bins, which we use to create the frequency polygon.

## Customizing Frequency Polygons in Base R

You can customize the appearance of the frequency polygon using various arguments in the plot() function. For example, you can change the line color and line type as follows:

plot(hist_info$mids, hist_info$counts, type = "l", col = "blue", lty = 2, main = "Frequency Polygon", xlab = "Values", ylab = "Frequency")

You can also add points to the frequency polygon:

plot(hist_info$mids, hist_info$counts, type = "b", pch = 19, col = "blue", main = "Frequency Polygon", xlab = "Values", ylab = "Frequency")

In this case, type = "b" means that R plots both points and lines, and pch = 19 specifies the type of point to plot.

## Creating Frequency Polygons Using ggplot2

The ggplot2 package provides the geom_freqpoly() function for creating frequency polygons. Let’s see how to create a simple frequency polygon using ggplot2.

# Convert the data to a data frame
data_df <- data.frame(Values = data)

library(ggplot2)

# Create a frequency polygon
ggplot(data_df, aes(x = Values)) +
geom_freqpoly(binwidth = 0.1) +
labs(title = "Frequency Polygon", x = "Values", y = "Frequency")

In this example, we first convert the data to a data frame, which is the format that ggplot2 requires. We then use the geom_freqpoly() function to create the frequency polygon. The binwidth argument controls the width of the bins, similar to the breaks argument in the hist() function.

## Customizing Frequency Polygons in ggplot2

You can customize the appearance of the frequency polygon using various arguments in the geom_freqpoly() function. For example, you can change the line color and line type as follows:

ggplot(data_df, aes(x = Values)) +
geom_freqpoly(binwidth = 0.1, col = "blue", linetype = "dashed") +
labs(title = "Frequency Polygon", x = "Values", y = "Frequency")

You can also add points to the frequency polygon using the geom_point() function:

ggplot(data_df, aes(x = Values)) +
geom_freqpoly(binwidth = 0.1, col = "blue") +
geom_point(stat = "bin", binwidth = 0.1, size = 1.5, col = "red") +
labs(title = "Frequency Polygon", x = "Values", y = "Frequency")

In this case, stat = "bin" means that R calculates the bin counts, and size = 1.5 controls the size of the points.

## Comparing Distributions with Frequency Polygons

Frequency polygons are especially useful for comparing multiple distributions. In ggplot2, you can do this by mapping a categorical variable to the colour aesthetic. For example:

# Create two normal distributions with 1000 data points each
set.seed(123)
group_A <- rnorm(1000, mean = 0)
group_B <- rnorm(1000, mean = 1)

# Combine the data into a data frame
data <- data.frame(
Value = c(group_A, group_B),
Group = rep(c("A", "B"), each = 1000)
)

# Create a frequency polygon
ggplot(data, aes(x = Value, colour = Group)) +
geom_freqpoly(binwidth = 0.1) +
labs(title = "Comparing Distributions with Frequency Polygons", x = "Values", y = "Frequency")

In this example, we create two normal distributions with different means, combine them into a data frame, and create a frequency polygon that compares the two distributions.

## Conclusion

Frequency polygons are a useful tool for understanding the distribution of data and comparing multiple distributions. While the base R functions can create basic frequency polygons, the ggplot2 package offers more flexibility and customization options.

Posted in RTagged