我希望删除数据框中包含特定模式的行,并且如果可能的话,我希望使用 tidyverse 语法。
我希望删除第 1 列包含“cat”以及 col2:4 中任何一个包含以下任何单词的行:狗、狐狸或牛。对于此示例,将从原始数据中删除第 1 行和第 4 行。
这是一个示例数据集:
df <- data.frame(col1 = c("cat", "fox", "dog", "cat", "pig"),
col2 = c("lion", "tiger", "elephant", "dog", "cow"),
col3 = c("bird", "cow", "sheep", "fox", "dog"),
col4 = c("dog", "cat", "cat", "cow", "fox"))
Run Code Online (Sandbox Code Playgroud)
我尝试过多种不同的变体,但不断遇到问题。这是我的最新尝试:
filtered_df <- df %>%
filter(!(animal1 == "cat" & !any(cowfoxdog <- across(animal2:animal4, ~ . %in% c("cow", "fox", "dog")))))
Run Code Online (Sandbox Code Playgroud)
这将返回以下错误:
Error in `filter()`:
! Problem while computing `..1 = !...`.
Caused by error in `FUN()`:
! only defined on a data frame with all numeric variables
Run Code Online (Sandbox Code Playgroud)
您可以使用if_any()。为了进行更稳健的测试,我首先添加了一行,col1 == "cat"但"dog"、"fox"或"cow" 不出现在第 2-4 列中。
library(dplyr)
df <- df %>%
add_row(col1 = "cat", col2 = "sheep", col3 = "lion", col4 = "tiger")
df %>%
filter(!(col1 == "cat" & if_any(col2:col4, \(x) x %in% c("dog", "fox", "cow"))))
Run Code Online (Sandbox Code Playgroud)
col1 col2 col3 col4
1 fox tiger cow cat
2 dog elephant sheep cat
3 pig cow dog fox
4 cat sheep lion tiger
Run Code Online (Sandbox Code Playgroud)