# How to Delete Rows in R?

Deleting rows in R is a common operation in data manipulation and analysis. It may be necessary to remove rows due to various reasons such as duplicates, outliers, or other criteria based on the analysis needs. This article will explore multiple methods to delete rows in R, using both base R and other contributed packages, and each method will be illustrated with examples.

### 1. Using Row Indexes with Square Brackets

In R, you can remove rows by subsetting the dataframe using square brackets.

# Sample DataFrame
df <- data.frame(
ID = c(1, 2, 3, 4, 5),
Value = c(10, 20, 30, 40, 50)
)

# Deleting the 2nd row
df <- df[-2, ]

# Output DataFrame
print(df)

Output:

  ID Value
1  1    10
3  3    30
4  4    40
5  5    50

### 2. Using Logical Conditions

Logical conditions can be used with square brackets to subset a dataframe and remove rows meeting certain criteria.

# Removing rows where Value is less than 30
df <- df[df$Value >= 30, ] # Output DataFrame print(df) Output:  ID Value 3 3 30 4 4 40 5 5 50 ### 3. Using the subset( ) Function The subset() function in base R can be used to filter out rows based on conditions. # Removing rows where ID is not 3 df <- subset(df, ID != 3) # Output DataFrame print(df) Output:  ID Value 4 4 40 5 5 50 ### 4. Using dplyr package The filter() function in dplyr is very versatile and intuitive to remove rows based on conditions. # Sample DataFrame df <- data.frame( ID = c(1, 2, 3, 4, 5), Value = c(10, 20, 30, 40, 50) ) library(dplyr) # Removing rows where ID is 1 df <- df %>% filter(ID != 1) # Output DataFrame print(df) Output:  ID Value 1 2 20 2 3 30 3 4 40 4 5 50 ### 5. Using the slice( ) Function The slice() function from dplyr package can be used to remove rows by their position. # Removing the 1st row df <- df %>% slice(-1) # Output DataFrame print(df) Output:  ID Value 1 3 30 2 4 40 3 5 50 ### 6. Using na.omit( ) Function The na.omit() function removes rows containing NA values. Let’s create a dataframe with some NA values and then use the na.omit() function to remove the rows containing NA values. # Creating a sample DataFrame with NA values df <- data.frame( ID = c(1, 2, 3, 4, 5), Name = c("John", "Sara", NA, "Anna", "Mike"), Age = c(21, 35, 30, 25, NA) ) # Displaying original DataFrame print("Original DataFrame:") print(df) # Applying na.omit() to remove rows containing NA values df_no_na <- na.omit(df) # Displaying DataFrame after removing rows with NA values print("DataFrame after omitting NA values:") print(df_no_na) Output:  "Original DataFrame:" ID Name Age 1 1 John 21 2 2 Sara 35 3 3 <NA> 30 4 4 Anna 25 5 5 Mike <NA>  "DataFrame after omitting NA values:" ID Name Age 1 1 John 21 2 2 Sara 35 4 4 Anna 25 Here, you can see that the rows 3 and 5 from the original dataframe, which had NA values in the Name and Age columns respectively, have been omitted in the df_no_na dataframe. ### In-Depth Examples: #### A. Combining Conditions to Remove Rows # Sample DataFrame df <- data.frame( ID = c(1, 2, 3, 4, 5), Value = c(10, 20, 30, 40, 50) ) # Removing rows where ID is less than 3 or Value is greater than 40 df <- df[!(df$ID < 3 | df\$Value > 40), ]

Output:

  ID Value
3  3    30
4  4    40

This will remove rows where the ID is less than 3 or the Value is greater than 40.

#### B. Removing Duplicate Rows

# Creating a sample DataFrame with duplicate rows
df <- data.frame(
ID = c(1, 2, 2, 3, 4, 4, 5),
Name = c("John", "Sara", "Sara", "Anna", "Mike", "Mike", "Eva"),
Age = c(21, 35, 35, 25, 30, 30, 22)
)

# Displaying the original DataFrame
print("Original DataFrame:")
print(df)

# Removing duplicate rows
df_no_duplicates <- df[!duplicated(df), ]

# Displaying the DataFrame after removing duplicate rows
print("DataFrame after removing duplicate rows:")
print(df_no_duplicates)

Output:

 "Original DataFrame:"
ID Name Age
1  1 John  21
2  2 Sara  35
3  2 Sara  35
4  3 Anna  25
5  4 Mike  30
6  4 Mike  30
7  5  Eva  22

 "DataFrame after removing duplicate rows:"
ID Name Age
1  1 John  21
2  2 Sara  35
4  3 Anna  25
5  4 Mike  30
7  5  Eva  22

Here, the rows 3 and 6 from the original dataframe, which were duplicates of rows 2 and 5 respectively, have been removed in the df_no_duplicates dataframe.

#### C. Using filter( ) with Multiple Conditions

# Sample DataFrame
df <- data.frame(
ID = c(1, 2, 3, 4, 5),
Value = c(10, 20, 30, 40, 50)
)

# Removing rows where ID is 4 and Value is 40
df <- df %>% filter(!(ID == 4 & Value == 40))
print(df)

Output:

  ID Value
1  1    10
2  2    20
3  3    30
4  5    50

#### D. Combining slice( ) and n( ) Functions

library(dplyr)

# Removing the last row of the dataframe
df <- df %>% slice(1:(n()-1))

Output:

  ID Value
1  1    10
2  2    20
3  3    30

This will remove the last row of the dataframe.

#### E. Using drop_na( ) to Remove Rows with NA Values

# Creating a sample DataFrame with NA values
df <- data.frame(
ID = c(1, 2, 3, 4, 5),
Name = c("John", "Sara", NA, "Anna", "Mike"),
Age = c(21, 35, 30, 25, NA)
)

library(tidyr)

# Removing rows with NA values in any column
df <- drop_na(df)
print(df)

Output:

  ID Name Age
1  1 John  21
2  2 Sara  35
3  4 Anna  25

This will remove any rows with NA values in any of the columns of the dataframe.

### Conclusion

Deleting rows is a critical part of data manipulation and preprocessing in R. Whether it’s removing duplicates, filtering out irrelevant data, or handling missing values, knowing how to delete rows effectively is crucial.

R provides a variety of functions and operators in base R and in contributed packages like dplyr and tidyr, which make it easy and intuitive to delete rows from a dataframe based on a wide range of criteria. By understanding these different approaches, you can choose the one that best suits your needs and efficiently manage your data to prepare it for further analysis.

Posted in RTagged