How to Create Relative Frequency Tables in R

Spread the love

Understanding the distribution of categorical variables within a dataset is a key component of statistical analysis and data exploration. One of the primary methods to achieve this is by creating frequency tables. While a standard frequency table provides a count of each category, a relative frequency table goes a step further by providing the proportion or percentage of the total observations that each category represents. This article will guide you on how to create relative frequency tables in R.

The Concept of Relative Frequency Tables

A relative frequency table is a statistical table that displays the relative frequencies of data elements in a dataset. It consists of the number of observations in each category of a categorical variable divided by the total number of observations. The relative frequencies can be expressed as proportions (fractions) or as percentages.

The relative frequency table is particularly useful for comparing the distributions of different categories or groups within your dataset.

Creating Relative Frequency Tables in R

Creating relative frequency tables in R requires the use of several functions, primarily table() for generating frequency tables and prop.table() for converting frequency tables into relative frequencies. These functions form part of R’s base functionality, requiring no additional packages.

Let’s now delve into the specifics of creating relative frequency tables.

Using the table() and prop.table() Functions

The most straightforward way to create a relative frequency table in R is by first using the table() function to create a frequency table, and then the prop.table() function to convert these frequencies into proportions.

Here’s an example:

# Create a vector
my_vector <- c('apple', 'banana', 'apple', 'orange', 'banana', 'banana')

# Create a frequency table
frequency_table <- table(my_vector)

# Print the frequency table
print(frequency_table)

# Create a relative frequency table
relative_frequency_table <- prop.table(frequency_table)

# Print the relative frequency table
print(relative_frequency_table)

In this example, my_vector is a vector of fruit names. The table() function is used to generate a frequency table, which is then converted into a relative frequency table using prop.table(). The output of print(relative_frequency_table) would be:

my_vector
  apple  banana  orange 
0.3333333 0.5000000 0.1666667 

This indicates that ‘apple’ represents about 33.33% of the data, ‘banana’ about 50%, and ‘orange’ about 16.67%.

Converting Proportions to Percentages

The prop.table() function provides proportions as fractions of 1. To convert these fractions into more readily understandable percentages, you can multiply the proportions by 100. Here’s how:

# Convert the relative frequency table to percentages
percentage_table <- relative_frequency_table * 100

# Print the percentage table
print(percentage_table)

The output would be:

my_vector
  apple  banana  orange 
    33.33    50.00    16.67 

This table now shows the percentage of each fruit in the vector.

Relative Frequency Tables with Multiple Variables

When working with datasets with multiple variables, you can still create a relative frequency table. This is achieved by applying the table() and prop.table() functions in a slightly different way:

# Create a data frame
my_data <- data.frame(
  "Fruit" = c('apple', 'banana', 'apple', 'orange', 'banana', 'banana'),
  "Color" = c('red', 'yellow', 'red', 'orange', 'yellow', 'yellow')
)

# Create a cross-table
cross_table <- table(my_data$Fruit, my_data$Color)

# Create a relative frequency table
relative_frequency_table <- prop.table(cross_table)

# Print the relative frequency table
print(relative_frequency_table)

This code creates a two-dimensional frequency table (a cross-table) using table(), and then converts it into relative frequencies using prop.table(). The output is a two-dimensional relative frequency table.

Using the dplyr Package for Relative Frequency Tables

The dplyr package in R is another powerful tool for generating relative frequency tables. It offers a more flexible and intuitive syntax for data manipulation. If it isn’t already installed, you can install it using install.packages("dplyr"), and load it into your R environment with library(dplyr).

Here’s how to create a relative frequency table using dplyr:

# Load the dplyr package
library(dplyr)

# Create a data frame
my_data <- data.frame(
  "Fruit" = c('apple', 'banana', 'apple', 'orange', 'banana', 'banana'),
  "Color" = c('red', 'yellow', 'red', 'orange', 'yellow', 'yellow')
)

# Create a relative frequency table
relative_frequency_table <- my_data %>%
  group_by(Fruit) %>%
  summarise(Count = n()) %>%
  mutate(Frequency = Count / sum(Count) * 100)

# Print the relative frequency table
print(relative_frequency_table)

In this example, group_by() is used to group the data by ‘Fruit’, summarise() is used to count the number of each fruit, and mutate() is used to add a new column for relative frequencies.

Conclusion

A relative frequency table provides an insightful snapshot of a dataset by presenting the proportion of total observations each category represents. This article has detailed the steps involved in creating relative frequency tables in R, using both base R functions like table() and prop.table(), and the dplyr package.

Posted in RTagged

Leave a Reply