我正在尝试在数据框中对多个列进行group_by,并且我无法在group_by函数中写出每个列名,因此我想将列名称称为向量,如下所示:
cols <- colnames(mtcars)[grep("[a-z]{3,}$", colnames(mtcars))]
mtcars %>% filter(disp < 160) %>% group_by(cols) %>% summarise(n = n())
Run Code Online (Sandbox Code Playgroud)
这会返回错误:
Error in mutate_impl(.data, dots) :
Column `mtcars[colnames(mtcars)[grep("[a-z]{3,}$", colnames(mtcars))]]` must be length 12 (the number of rows) or one, not 7
Run Code Online (Sandbox Code Playgroud)
我肯定想使用dplyr函数来做到这一点,但无法想出这一点.
Psi*_*dom 20
您可以使用group_by_at,您可以将列名称的字符向量作为组变量传递:
mtcars %>%
filter(disp < 160) %>%
group_by_at(cols) %>%
summarise(n = n())
# A tibble: 12 x 8
# Groups: mpg, cyl, disp, drat, qsec, gear [?]
# mpg cyl disp drat qsec gear carb n
# <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <int>
# 1 19.7 6 145.0 3.62 15.50 5 6 1
# 2 21.4 4 121.0 4.11 18.60 4 2 1
# 3 21.5 4 120.1 3.70 20.01 3 1 1
# 4 22.8 4 108.0 3.85 18.61 4 1 1
# ...
Run Code Online (Sandbox Code Playgroud)
或者您可以group_by_at使用vars和列选择辅助函数在列内选择:
mtcars %>%
filter(disp < 160) %>%
group_by_at(vars(matches('[a-z]{3,}$'))) %>%
summarise(n = n())
# A tibble: 12 x 8
# Groups: mpg, cyl, disp, drat, qsec, gear [?]
# mpg cyl disp drat qsec gear carb n
# <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <int>
# 1 19.7 6 145.0 3.62 15.50 5 6 1
# 2 21.4 4 121.0 4.11 18.60 4 2 1
# 3 21.5 4 120.1 3.70 20.01 3 1 1
# 4 22.8 4 108.0 3.85 18.61 4 1 1
# ...
Run Code Online (Sandbox Code Playgroud)
Har*_*nes 13
我相信group_by_at现在已经被group_by和的组合所取代across。并且summarise有一个实验.groups参数,您可以在其中选择如何在创建汇总对象后处理分组。这是一个可供考虑的替代方案:
cols <- colnames(mtcars)[grep("[a-z]{3,}$", colnames(mtcars))]
original <- mtcars %>%
filter(disp < 160) %>%
group_by_at(cols) %>%
summarise(n = n())
superseded <- mtcars %>%
filter(disp < 160) %>%
group_by(across(all_of(cols))) %>%
summarise(n = n(), .groups = 'drop_last')
all.equal(original, superseded)
Run Code Online (Sandbox Code Playgroud)
这是一篇博客文章,详细介绍了该across功能的使用:https :
//www.tidyverse.org/blog/2020/04/dplyr-1-0-0-colwise/