按ID列折叠所有列

Question

按ID列折叠所有列

我正在尝试做类似于这里所回答的事情,这让我有80%的路要走.我有一个带有一个ID列和多个信息列的数据框.我想汇总所有其他列,以便每个ID只有一行,并且多个条目由例如分号分隔.这是我拥有的和我想要的一个例子.

有:

     ID  info1          info2
1 id101    one          first
2 id102   twoA second alias A
3 id102   twoB second alias B
4 id103 threeA  third alias A
5 id103 threeB  third alias B
6 id104   four         fourth
7 id105   five          fifth

Run Code Online (Sandbox Code Playgroud)

想:

     ID          info1                          info2
1 id101            one                          first
2 id102     twoA; twoB second alias A; second alias B
3 id103 threeA; threeB   third alias A; third alias B
4 id104           four                         fourth
5 id105           five                          fifth

Run Code Online (Sandbox Code Playgroud)

这是用于生成这些代码的代码:

have <- data.frame(ID=paste0("id", c(101, 102, 102, 103, 103, 104, 105)),
                   info1=c("one", "twoA", "twoB", "threeA", "threeB", "four", "five"), 
                   info2=c("first", "second alias A", "second alias B", "third alias A", "third alias B", "fourth", "fifth"),
                   stringsAsFactors=FALSE)
want <- data_frame(ID=paste0("id", c(101:105)),
                   info1=c("one", "twoA; twoB", "threeA; threeB", "four", "five"), 
                   info2=c("first", "second alias A; second alias B", "third alias A; third alias B", "fourth", "fifth"),
                   stringsAsFactors=FALSE)

Run Code Online (Sandbox Code Playgroud)

这个问题基本上问了同一个问题,但只有一个"信息"专栏.我有多个其他列,并希望为所有这些列执行此操作.

使用dplyr执行此操作的加分点.

Answer 1

tal*_*lat 16

这是一个使用选项summarise_each(可以很容易地将更改应用于除分组变量之外的所有列)和toString:

require(dplyr)

have %>%
  group_by(ID) %>%
  summarise_each(funs(toString))

#Source: local data frame [5 x 3]
#
#     ID          info1                          info2
#1 id101            one                          first
#2 id102     twoA, twoB second alias A, second alias B
#3 id103 threeA, threeB   third alias A, third alias B
#4 id104           four                         fourth
#5 id105           five                          fifth

Run Code Online (Sandbox Code Playgroud)

或者,如果您希望它以分号分隔,您可以使用:

have %>%
  group_by(ID) %>%
  summarise_each(funs(paste(., collapse = "; ")))

Run Code Online (Sandbox Code Playgroud)

这就是生活;-) @RichardScriven (3认同)

Answer 2

小智 12

好老aggregate,这样做很好

aggregate(have[,2:3], by=list(have$ID), paste, collapse=";")

Run Code Online (Sandbox Code Playgroud)

问题是:它是否规模？

公式方法是一个更清洁的聚合(.~ID,have,paste,collapse =";")`并且可能更快 (5认同)

Answer 3

Ric*_*ven 8

这是一个data.table解决方案.

library(data.table)
setDT(have)[, lapply(.SD, paste, collapse = "; "), by = ID]
#       ID          info1                          info2
# 1: id101            one                          first
# 2: id102     twoA; twoB second alias A; second alias B
# 3: id103 threeA; threeB   third alias A; third alias B
# 4: id104           four                         fourth
# 5: id105           five                          fifth

Run Code Online (Sandbox Code Playgroud)

归档时间：	11 年，1 月前
查看次数：	7455 次
最近记录：	7 年前