使用dplyr中的c()汇总字符串汇总或聚合

Mer*_*glu 4 string aggregate r dplyr

我想在dplyr中使用c()作为聚合函数来聚合一些字符串.我首先尝试了以下内容:

> InsectSprays$spray = as.character(InsectSprays$spray)
> dt = tbl_df(InsectSprays)
> dt %>% group_by(count) %>% summarize(c(spray))
Error: expecting a single value
Run Code Online (Sandbox Code Playgroud)

但是在aggregate()中使用c()函数有效:

> da = aggregate(spray ~ count, InsectSprays, c)
> head(da)
  count                  spray
1     0                   C, C
2     1       C, C, C, C, E, E
3     2             C, C, D, E>
Run Code Online (Sandbox Code Playgroud)

在stackoverflow中搜索暗示,使用带崩溃的paste()代替c()函数可以解决问题:

dt %>% group_by(count) %>% summarize(s=paste(spray, collapse=","))
Run Code Online (Sandbox Code Playgroud)

要么

dt %>% group_by(count) %>% summarize(paste( c(spray), collapse=","))
Run Code Online (Sandbox Code Playgroud)

我的问题是:为什么c()函数在aggregate()中工作但在dplyr summarize()中不起作用?

Ric*_*ven 5

如果你仔细看看,你会发现c()我们使用时确实有效(在某种程度上)do().但据我了解,dplyr目前不允许这种类型的列表打印

> InsectSprays$spray = as.character(InsectSprays$spray)
> dt = tbl_df(InsectSprays)
> doC <- dt %>% group_by(count) %>% do(s = c(.$spray))
> head(doC)
Source: local data frame [6 x 2]

  count        s
1     0 <chr[2]>
2     1 <chr[6]>
3     2 <chr[4]>
4     3 <chr[8]>
5     4 <chr[4]>
6     5 <chr[7]>

> head(doC)[[2]]
[[1]]
[1] "C" "C"

[[2]]
[1] "C" "C" "C" "C" "E" "E"

[[3]]
[1] "C" "C" "D" "E"

[[4]]
[1] "C" "C" "D" "D" "E" "E" "E" "E"

[[5]]
[1] "C" "D" "D" "E"

[[6]]
[1] "D" "D" "D" "D" "D" "E" "E"
Run Code Online (Sandbox Code Playgroud)

  • 我认为你也应该能够通过 Summary 来做到这一点,请参阅 https://github.com/hadley/dplyr/issues/832 (2认同)