考虑以下示例:
\nlibrary(tidyverse)\n\ndf <- tibble(\n cat = rep(1:2, times = 4, each = 2),\n loc = rep(c("a", "b"), each = 8),\n value = rnorm(16)\n)\n\ndf %>% \n group_by(cat, loc) %>% \n summarise(mean = mean(value), .groups = "drop")\n\n# # A tibble: 4 x 3\n# cat loc mean\n# * <int> <chr> <dbl>\n# 1 1 a -0.563\n# 2 1 b -0.394\n# 3 2 a 0.159\n# 4 2 b 0.212\nRun Code Online (Sandbox Code Playgroud)\n我想为最后两行创建一个函数,它接受一个group参数将多个列传递给group_by.
这是一个mean通过一组列计算值的虚拟函数作为示例:
group_mean <- function(data, col_value, group) {\n data %>% \n group_by(across(all_of(group))) %>% \n summarise(mean = mean({{col_value}}), .groups = "drop")\n}\n\ngroup_mean(df, value, c("cat", "loc"))\n\n# # A tibble: 4 x 3\n# cat loc mean\n# * <int> <chr> <dbl>\n# 1 1 a -0.563\n# 2 1 b -0.394\n# 3 2 a 0.159\n# 4 2 b 0.212\nRun Code Online (Sandbox Code Playgroud)\n该函数有效,但我更喜欢使用tidyselect/rlang方法来避免引用列名称,如下所示:
group_mean(df, value, c(cat, loc))\n\n# Error: Problem adding computed columns in `group_by()`.\n# x Problem with `mutate()` input `..1`.\n# x object \'loc\' not found\n# \xe2\x84\xb9 Input `..1` is `across(all_of(c(cat, loc)))`.\nRun Code Online (Sandbox Code Playgroud)\n括group起来{{}}适用于单列,但不适用于多列。我怎样才能做到这一点?
考虑使用...,然后我们可以选择在转换为symbol后使用带引号或不带引号的ensym
group_mean <- function(data, col_value, ...) {
data %>%
group_by(!!! ensyms(...)) %>%
summarise(mean = mean({{col_value}}), .groups = "drop")
}
Run Code Online (Sandbox Code Playgroud)
-测试
> group_mean(df, value, cat, loc)
# A tibble: 4 x 3
cat loc mean
<int> <chr> <dbl>
1 1 a 0.327
2 1 b -0.291
3 2 a -0.382
4 2 b -0.320
> group_mean(df, value, 'cat', 'loc')
# A tibble: 4 x 3
cat loc mean
<int> <chr> <dbl>
1 1 a 0.327
2 1 b -0.291
3 2 a -0.382
4 2 b -0.320
Run Code Online (Sandbox Code Playgroud)
如果我们已经用作...其他参数,那么一个选项是
group_mean <- function(data, col_value, group) {
grp_lst <- as.list(substitute(group))
if(length(grp_lst)> 1) grp_lst <- grp_lst[-1]
grps <- purrr::map_chr(grp_lst, rlang::as_string)
data %>%
group_by(across(all_of(grps))) %>%
summarise(mean = mean({{col_value}}), .groups = "drop")
}
Run Code Online (Sandbox Code Playgroud)
-测试
> group_mean(df, value, c(cat, loc))
# A tibble: 4 x 3
cat loc mean
<int> <chr> <dbl>
1 1 a 0.327
2 1 b -0.291
3 2 a -0.382
4 2 b -0.320
Run Code Online (Sandbox Code Playgroud)