# How to Perform Label Encoding in R

Label encoding is a technique used to convert categorical data into a numerical format that machine learning algorithms can better understand. While some algorithms can work with categorical data directly, many algorithms and statistical methods require numerical input. In R, a popular programming language for data analysis and machine learning, you can perform label encoding in multiple ways.

1. What is Label Encoding?
2. When to Use Label Encoding
3. Built-in Methods in Base R
• Factor Levels
4. The car Package Method
5. The dplyr Package Method
6. The data.table Package Method
7. The caret Package Method
8. Custom Functions for Label Encoding
9. Multi-Level Label Encoding
11. Best Practices
12. Conclusion

## 1. What is Label Encoding?

Label encoding involves converting each unique category within a variable to a numerical value. For example, the variable “Color” with categories “Red,” “Green,” and “Blue” could be encoded as 1, 2, and 3, respectively.

## 2. When to Use Label Encoding

Label encoding is suitable when:

• The categorical variable is ordinal.
• The machine learning model you intend to use does not support categorical data.

## 3. Built-in Methods in Base R – Factor Levels

The simplest way to perform label encoding in R is by converting a character vector to a factor and then to integer.

data_vector <- c("Red", "Green", "Blue", "Green", "Red")
data_factor <- as.factor(data_vector)
data_encoded <- as.integer(data_factor)

print(data_encoded)
# Output: 3 2 1 2 3

## 4. The car Package Method

The car package offers more functionality, including the reordering of factor levels.

install.packages("car")
library(car)

data_vector <- c("Red", "Green", "Blue", "Green", "Red")
data_factor <- as.factor(data_vector)
data_encoded <- car::recode(data_factor, " 'Red'=1; 'Green'=2; 'Blue'=3 ")

print(data_encoded)
# Output: 1 2 3 2 1

## 5. The dplyr Package Method

With dplyr, you can manipulate data frames easily. To use dplyr for label encoding, first install and load the package.

install.packages("dplyr")
library(dplyr)

df <- data.frame(Color = c("Red", "Green", "Blue", "Green", "Red"))

df <- df %>%
mutate(Color_encoded = as.integer(as.factor(Color)))

print(df)

## 6. The data.table Package Method

The data.table package is powerful for large data sets and supports label encoding with minimal changes to the syntax.

install.packages("data.table")
library(data.table)

dt <- data.table(Color = c("Red", "Green", "Blue", "Green", "Red"))
dt[, Color_encoded := as.integer(as.factor(Color))]

print(dt)

## 7. The caret Package Method

The caret package provides numerous pre-processing functions for label encoding:

install.packages("caret")
library(caret)

# Create a data frame with a categorical column
df <- data.frame(Color = c("Red", "Green", "Blue", "Green", "Red"))

# Convert the categorical column into a factor
df$Color <- as.factor(df$Color)

# Perform label encoding using as.integer()
df$Color_encoded <- as.integer(df$Color)

# Display the data frame
print(df)

## 8. Custom Functions for Label Encoding

You can write your custom function to achieve label encoding.

custom_encoder <- function(vector) {
levels <- unique(vector)
dict <- setNames(1:length(levels), levels)
return(sapply(vector, function(x) dict[x]))
}

data_vector <- c("Red", "Green", "Blue", "Green", "Red")
data_encoded <- custom_encoder(data_vector)

print(data_encoded)
# Output: 1 2 3 2 1

## 9. Multi-Level Label Encoding

When you have multiple columns to encode, you can use lapply() or sapply() to loop through each one.

• Simple to implement
• Does not increase data dimensionality