The case_when()
function in R is part of the dplyr
package and is an elegant way to perform multiple if-else statements within a mutate()
or summarise()
function, or even stand-alone. In this article, we will dive into how to write and understand the case_when()
function in R, including its syntax, examples, and best practices.
Table of Contents
- Prerequisites
- Introduction to
case_when()
- Basic Syntax
- Practical Examples
- 4.1 Basic Usage
- 4.2 Using with
mutate()
- 4.3 Using with
summarise()
- 4.4 Stand-alone Use
- 4.5 Nested
case_when()
- Advanced Tips
- Common Pitfalls and How to Avoid Them
- Conclusion
1. Prerequisites
Before diving into the case_when()
function, you should have install the dplyr
package, which can be done using the following code:
install.packages("dplyr")
Load the package using:
library(dplyr)
2. Introduction to case_when( )
The case_when()
function allows you to vectorize multiple if_else()
conditions. It operates in a way similar to SQL’s CASE WHEN
statement, helping you avoid a lengthy series of nested ifelse()
statements.
3. Basic Syntax
The basic syntax of case_when()
is as follows:
case_when(
condition1 ~ value1,
condition2 ~ value2,
...
TRUE ~ value_default
)
condition1
,condition2
, …: These are logical conditions that will be checked.value1
,value2
, …: These are the values that will be returned if the condition is TRUE.TRUE ~ value_default
: A default value if none of the conditions are met.
4. Practical Examples
4.1 Basic Usage
Here’s a basic example using a vector of numbers:
nums <- c(1, 2, 3, 4, 5)
result <- case_when(
nums <= 2 ~ "small",
nums <= 4 ~ "medium",
TRUE ~ "large"
)
print(result)
output:
[1] "small" "small" "medium" "medium" "large"
4.2 Using with mutate( )
You can use case_when()
within the mutate()
function to create a new column based on some conditions.
data <- tibble(
age = c(22, 45, 67, 34, 29)
)
data <- data %>% mutate(
age_group = case_when(
age < 30 ~ "young",
age < 50 ~ "middle-aged",
TRUE ~ "old"
)
)
4.3 Using with summarise( )
Similarly, you can use case_when()
within summarise()
to aggregate data conditionally.
data %>%
summarise(
num_young = sum(case_when(
age < 30 ~ 1,
TRUE ~ 0
))
)
4.4 Stand-alone Use
You can use case_when()
as a stand-alone function to perform calculations based on conditions.
result <- case_when(
nums %% 2 == 0 ~ "even",
TRUE ~ "odd"
)
4.5 Nested case_when( )
You can also nest case_when()
functions for more complex logic.
result <- case_when(
nums %% 2 == 0 ~ case_when(
nums > 3 ~ "even and large",
TRUE ~ "even and small"
),
TRUE ~ "odd"
)
5. Advanced Tips
- Use
case_when()
in combination with otherdplyr
functions for cleaner and more efficient code. - You can use other R functions within
case_when()
for complex calculations.
6. Common Pitfalls and How to Avoid Them
- Order Matters: Conditions are checked in order. Once a TRUE condition is found,
case_when()
will stop checking subsequent conditions. - Missing Default Case: Always include a default case (
TRUE ~ value_default
) to catch cases that don’t meet any condition. - Type Consistency: Make sure all the return values are of the same type.
7. Conclusion
The case_when()
function in R, a part of the dplyr
package, provides a clean and efficient way to perform multiple conditional statements. Its usage can be diverse, ranging from stand-alone cases to being embedded in other dplyr
functions like mutate()
and summarise()
. It is essential to remember the order of conditions, include a default case, and maintain type consistency while using case_when()
.
With this comprehensive guide, you should now be able to employ case_when()
effectively in your R programming.