将 filter_all(any_vars()) 转换为 filter(across())

Question

将 filter_all(any_vars()) 转换为 filter(across())

在更新我自己对另一个线程的答案时，我无法想出一个好的解决方案来替换最后一个示例（见下文）。这个想法是获取任何列包含某个字符串的所有行，在我的示例“V”中。

library(tidyverse)

#get all rows where any column contains 'V'
diamonds %>%
  filter_all(any_vars(grepl('V',.))) %>%
  head
#> # A tibble: 6 x 10
#>   carat cut       color clarity depth table price     x     y     z
#>   <dbl> <ord>     <ord> <ord>   <dbl> <dbl> <int> <dbl> <dbl> <dbl>
#> 1 0.23  Good      E     VS1      56.9    65   327  4.05  4.07  2.31
#> 2 0.290 Premium   I     VS2      62.4    58   334  4.2   4.23  2.63
#> 3 0.24  Very Good J     VVS2     62.8    57   336  3.94  3.96  2.48
#> 4 0.24  Very Good I     VVS1     62.3    57   336  3.95  3.98  2.47
#> 5 0.26  Very Good H     SI1      61.9    55   337  4.07  4.11  2.53
#> 6 0.22  Fair      E     VS2      65.1    61   337  3.87  3.78  2.49


# this does naturally not give the desired output! 
diamonds %>%
  filter(across(everything(), ~ grepl('V', .))) %>%
  head
#> # A tibble: 0 x 10

Run Code Online (Sandbox Code Playgroud)

我发现了一个帖子，其中海报思考了类似的东西，但在 grepl 上应用类似的逻辑不起作用。

### don't run, this is ugly and does not work
diamonds %>%
  rowwise %>%
  filter(any(grepl("V", across(everything())))) %>%
  head

Run Code Online (Sandbox Code Playgroud)

Answer 1

小智 6

这是非常困难的，因为该示例表明您希望在所有列中的任何一个满足条件时（即您想要一个union）过滤所有列中的数据。这是用filter_all()和完成的any_vars()。

当所有列都满足条件时filter(across(everything(), ...))，从所有列中过滤掉（即这是一个交集，与之前的完全相反）。

要将其从交集转换为并集（即再次获取任何列满足条件的行），您可能需要检查行总和：

diamonds %>% filter(rowSums(across(everything(), ~grepl("V", .x))) > 0)
Run Code Online (Sandbox Code Playgroud)
它将TRUE对出现在该行中的所有s求和，即如果至少有一个值满足条件，则该行总和> 0将被显示出来。

我很抱歉across()不是的第一个孩子filter()，但至少有一些想法如何做到这一点。:-)

评估：

使用@TimTeaFan 的方法来检查：

identical( {diamonds %>% filter_all(any_vars(grepl('V',.))) }, {diamonds %>% filter(rowSums(across(everything(), ~grepl("V", .x))) > 0) } ) #> [1] TRUE
Run Code Online (Sandbox Code Playgroud)
基准：

根据我们在 TimTeaFan 的回答下的讨论，这是一个比较，令人惊讶的是，所有解决方案都有相似的时间：

diamonds %>% filter(rowSums(across(everything(), ~grepl("V", .x))) > 0)
Run Code Online (Sandbox Code Playgroud)
^{由reprex 包(v0.3.0)于 2020 年 7 月 14 日创建}

归档时间：	5 年，5 月前
查看次数：	516 次
最近记录：	5 年，5 月前