按ID列折叠所有列

Ste*_*ner 14 r dplyr

我正在尝试做类似于这里所回答的事情,这让我有80%的路要走.我有一个带有一个ID列和多个信息列的数据框.我想汇总所有其他列,以便每个ID只有一行,并且多个条目由例如分号分隔.这是我拥有的和我想要的一个例子.

有:

     ID  info1          info2
1 id101    one          first
2 id102   twoA second alias A
3 id102   twoB second alias B
4 id103 threeA  third alias A
5 id103 threeB  third alias B
6 id104   four         fourth
7 id105   five          fifth
Run Code Online (Sandbox Code Playgroud)

想:

     ID          info1                          info2
1 id101            one                          first
2 id102     twoA; twoB second alias A; second alias B
3 id103 threeA; threeB   third alias A; third alias B
4 id104           four                         fourth
5 id105           five                          fifth
Run Code Online (Sandbox Code Playgroud)

这是用于生成这些代码的代码:

have <- data.frame(ID=paste0("id", c(101, 102, 102, 103, 103, 104, 105)),
                   info1=c("one", "twoA", "twoB", "threeA", "threeB", "four", "five"), 
                   info2=c("first", "second alias A", "second alias B", "third alias A", "third alias B", "fourth", "fifth"),
                   stringsAsFactors=FALSE)
want <- data_frame(ID=paste0("id", c(101:105)),
                   info1=c("one", "twoA; twoB", "threeA; threeB", "four", "five"), 
                   info2=c("first", "second alias A; second alias B", "third alias A; third alias B", "fourth", "fifth"),
                   stringsAsFactors=FALSE)
Run Code Online (Sandbox Code Playgroud)

这个问题基本上问了同一个问题,但只有一个"信息"专栏.我有多个其他列,并希望为所有这些列执行此操作.

使用dplyr执行此操作的加分点.

tal*_*lat 16

这是一个使用选项summarise_each(可以很容易地将更改应用于除分组变量之外的所有列)和toString:

require(dplyr)

have %>%
  group_by(ID) %>%
  summarise_each(funs(toString))

#Source: local data frame [5 x 3]
#
#     ID          info1                          info2
#1 id101            one                          first
#2 id102     twoA, twoB second alias A, second alias B
#3 id103 threeA, threeB   third alias A, third alias B
#4 id104           four                         fourth
#5 id105           five                          fifth
Run Code Online (Sandbox Code Playgroud)

或者,如果您希望它以分号分隔,您可以使用:

have %>%
  group_by(ID) %>%
  summarise_each(funs(paste(., collapse = "; ")))
Run Code Online (Sandbox Code Playgroud)

  • 这就是生活;-) @RichardScriven (3认同)

小智 12

好老aggregate,这样做很好

aggregate(have[,2:3], by=list(have$ID), paste, collapse=";")
Run Code Online (Sandbox Code Playgroud)

问题是:它是否规模?

  • 公式方法是一个更清洁的聚合(.~ID,have,paste,collapse =";")`并且可能更快 (5认同)

Ric*_*ven 8

这是一个data.table解决方案.

library(data.table)
setDT(have)[, lapply(.SD, paste, collapse = "; "), by = ID]
#       ID          info1                          info2
# 1: id101            one                          first
# 2: id102     twoA; twoB second alias A; second alias B
# 3: id103 threeA; threeB   third alias A; third alias B
# 4: id104           four                         fourth
# 5: id105           five                          fifth
Run Code Online (Sandbox Code Playgroud)