如何以编程方式构建 dplyr 汇总语句？

Question

如何以编程方式构建 dplyr 汇总语句？

我正在尝试进行一些 dplyr 编程并且遇到了麻烦。我想要group_by任意数量的变量（因此，across），然后summarize基于任意长度（但长度都相同）的向量：

要应用该函数的列
申请的功能
新列的名称

所以，就像在maporapply语句中一样，我想执行最终看起来像这样的代码：

data %>%
  group_by(group_column) %>%
  summarize(new_name_1 = function_1(column_1),
  summarize(new_name_2 = function_2(column_2))

Run Code Online (Sandbox Code Playgroud)

这是我想要的和迄今为止我最好的镜头的一个例子。我知道names如果我使用 cross，我可以使用参数来清理它们，但我不相信 cross 是正确的方法。最后，我将把它应用到相当大的数据帧，所以我宁愿不计算额外的列。

想要的结果

mtcars %>%
  group_by(across(c("cyl", "carb"))) %>%
  summarise(across(c("disp", "hp"), list(mean = mean, sd = sd))) %>%
  select(cyl, carb, disp_mean, hp_sd)
#> `summarise()` regrouping output by 'cyl' (override with `.groups` argument)
#> # A tibble: 9 x 4
#> # Groups:   cyl [3]
#>     cyl  carb disp_mean hp_sd
#>   <dbl> <dbl>     <dbl> <dbl>
#> 1     4     1      91.4 16.1 
#> 2     4     2     117.  24.9 
#> 3     6     1     242.   3.54
#> 4     6     4     164.   7.51
#> 5     6     6     145   NA   
#> 6     8     2     346.  14.4 
#> 7     8     3     276.   0   
#> 8     8     4     406.  21.7 
#> 9     8     8     301   NA

Run Code Online (Sandbox Code Playgroud)

我得到的

mtcars %>%
  group_by(across(c("cyl", "carb"))) %>%
  summarise(across(c("disp", "hp"), list(mean = mean, sd = sd)))
#> `summarise()` regrouping output by 'cyl' (override with `.groups` argument)
#> # A tibble: 9 x 6
#> # Groups:   cyl [3]
#>     cyl  carb disp_mean disp_sd hp_mean hp_sd
#>   <dbl> <dbl>     <dbl>   <dbl>   <dbl> <dbl>
#> 1     4     1      91.4   21.4     77.4 16.1 
#> 2     4     2     117.    27.1     87   24.9 
#> 3     6     1     242.    23.3    108.   3.54
#> 4     6     4     164.     4.39   116.   7.51
#> 5     6     6     145     NA      175   NA   
#> 6     8     2     346.    43.4    162.  14.4 
#> 7     8     3     276.     0      180    0   
#> 8     8     4     406.    57.8    234   21.7 
#> 9     8     8     301     NA      335   NA

Run Code Online (Sandbox Code Playgroud)

Answer 1

akr*_*run 7

在不同的列上使用不同的功能，一个选项是使用collapfromcollapse

library(collapse)
collap(mtcars, ~ cyl + carb, custom = list(fmean = 4, fsd = 5))

Run Code Online (Sandbox Code Playgroud)

-输出

cyl   disp        hp carb
1   4  91.38 16.133815    1
2   4 116.60 24.859606    2
3   6 241.50  3.535534    1
4   6 163.80  7.505553    4
5   6 145.00        NA    6
6   8 345.50 14.433757    2
7   8 275.80  0.000000    3
8   8 405.50 21.725561    4
9   8 301.00        NA    8

Run Code Online (Sandbox Code Playgroud)

或者可以使用动态生成索引 match

collap(mtcars, ~ cyl + carb, custom = list(fmean =
   match('disp', names(mtcars)), fsd = match('hp', names(mtcars))))

Run Code Online (Sandbox Code Playgroud)

使用tidyverse，一个选项是遍历感兴趣的列名和函数，map2稍后再进行连接

library(dplyr)
library(purrr)
library(stringr)
map2(c("disp", "hp"), c("mean", "sd"), ~
   mtcars %>%
      group_by(across(c('cyl', 'carb'))) %>% 
      summarise(across(all_of(.x), match.fun(.y), 
         .names = str_c("{.col}_", .y)), .groups = 'drop')) %>% 
    reduce(inner_join)

Run Code Online (Sandbox Code Playgroud)

-输出

# A tibble: 9 x 4
    cyl  carb disp_mean hp_sd
  <dbl> <dbl>     <dbl> <dbl>
1     4     1      91.4 16.1 
2     4     2     117.  24.9 
3     6     1     242.   3.54
4     6     4     164.   7.51
5     6     6     145   NA   
6     8     2     346.  14.4 
7     8     3     276.   0   
8     8     4     406.  21.7 
9     8     8     301   NA

Run Code Online (Sandbox Code Playgroud)

归档时间：	4 年，3 月前
查看次数：	82 次
最近记录：	4 年，2 月前