Understanding the distribution of categorical variables within a dataset is a key component of statistical analysis and data exploration. One of the primary methods to achieve this is by creating frequency tables. While a standard frequency table provides a count of each category, a relative frequency table goes a step further by providing the proportion or percentage of the total observations that each category represents. This article will guide you on how to create relative frequency tables in R.

## The Concept of Relative Frequency Tables

A relative frequency table is a statistical table that displays the relative frequencies of data elements in a dataset. It consists of the number of observations in each category of a categorical variable divided by the total number of observations. The relative frequencies can be expressed as proportions (fractions) or as percentages.

The relative frequency table is particularly useful for comparing the distributions of different categories or groups within your dataset.

## Creating Relative Frequency Tables in R

Creating relative frequency tables in R requires the use of several functions, primarily `table()`

for generating frequency tables and `prop.table()`

for converting frequency tables into relative frequencies. These functions form part of R’s base functionality, requiring no additional packages.

Let’s now delve into the specifics of creating relative frequency tables.

### Using the table() and prop.table() Functions

The most straightforward way to create a relative frequency table in R is by first using the `table()`

function to create a frequency table, and then the `prop.table()`

function to convert these frequencies into proportions.

Here’s an example:

```
# Create a vector
my_vector <- c('apple', 'banana', 'apple', 'orange', 'banana', 'banana')
# Create a frequency table
frequency_table <- table(my_vector)
# Print the frequency table
print(frequency_table)
# Create a relative frequency table
relative_frequency_table <- prop.table(frequency_table)
# Print the relative frequency table
print(relative_frequency_table)
```

In this example, `my_vector`

is a vector of fruit names. The `table()`

function is used to generate a frequency table, which is then converted into a relative frequency table using `prop.table()`

. The output of `print(relative_frequency_table)`

would be:

```
my_vector
apple banana orange
0.3333333 0.5000000 0.1666667
```

This indicates that ‘apple’ represents about 33.33% of the data, ‘banana’ about 50%, and ‘orange’ about 16.67%.

### Converting Proportions to Percentages

The `prop.table()`

function provides proportions as fractions of 1. To convert these fractions into more readily understandable percentages, you can multiply the proportions by 100. Here’s how:

```
# Convert the relative frequency table to percentages
percentage_table <- relative_frequency_table * 100
# Print the percentage table
print(percentage_table)
```

The output would be:

```
my_vector
apple banana orange
33.33 50.00 16.67
```

This table now shows the percentage of each fruit in the vector.

### Relative Frequency Tables with Multiple Variables

When working with datasets with multiple variables, you can still create a relative frequency table. This is achieved by applying the `table()`

and `prop.table()`

functions in a slightly different way:

```
# Create a data frame
my_data <- data.frame(
"Fruit" = c('apple', 'banana', 'apple', 'orange', 'banana', 'banana'),
"Color" = c('red', 'yellow', 'red', 'orange', 'yellow', 'yellow')
)
# Create a cross-table
cross_table <- table(my_data$Fruit, my_data$Color)
# Create a relative frequency table
relative_frequency_table <- prop.table(cross_table)
# Print the relative frequency table
print(relative_frequency_table)
```

This code creates a two-dimensional frequency table (a cross-table) using `table()`

, and then converts it into relative frequencies using `prop.table()`

. The output is a two-dimensional relative frequency table.

### Using the dplyr Package for Relative Frequency Tables

The `dplyr`

package in R is another powerful tool for generating relative frequency tables. It offers a more flexible and intuitive syntax for data manipulation. If it isn’t already installed, you can install it using `install.packages("dplyr")`

, and load it into your R environment with `library(dplyr)`

.

Here’s how to create a relative frequency table using `dplyr`

:

```
# Load the dplyr package
library(dplyr)
# Create a data frame
my_data <- data.frame(
"Fruit" = c('apple', 'banana', 'apple', 'orange', 'banana', 'banana'),
"Color" = c('red', 'yellow', 'red', 'orange', 'yellow', 'yellow')
)
# Create a relative frequency table
relative_frequency_table <- my_data %>%
group_by(Fruit) %>%
summarise(Count = n()) %>%
mutate(Frequency = Count / sum(Count) * 100)
# Print the relative frequency table
print(relative_frequency_table)
```

In this example, `group_by()`

is used to group the data by ‘Fruit’, `summarise()`

is used to count the number of each fruit, and `mutate()`

is used to add a new column for relative frequencies.

## Conclusion

A relative frequency table provides an insightful snapshot of a dataset by presenting the proportion of total observations each category represents. This article has detailed the steps involved in creating relative frequency tables in R, using both base R functions like `table()`

and `prop.table()`

, and the `dplyr`

package.