我正在尝试做类似于这里所回答的事情,这让我有80%的路要走.我有一个带有一个ID列和多个信息列的数据框.我想汇总所有其他列,以便每个ID只有一行,并且多个条目由例如分号分隔.这是我拥有的和我想要的一个例子.
有:
ID info1 info2
1 id101 one first
2 id102 twoA second alias A
3 id102 twoB second alias B
4 id103 threeA third alias A
5 id103 threeB third alias B
6 id104 four fourth
7 id105 five fifth
Run Code Online (Sandbox Code Playgroud)
想:
ID info1 info2
1 id101 one first
2 id102 twoA; twoB second alias A; second alias B
3 id103 threeA; threeB third alias A; third alias B
4 id104 four fourth
5 id105 five fifth
Run Code Online (Sandbox Code Playgroud)
这是用于生成这些代码的代码:
have <- data.frame(ID=paste0("id", c(101, 102, 102, 103, 103, 104, 105)),
info1=c("one", "twoA", "twoB", "threeA", "threeB", "four", "five"),
info2=c("first", "second alias A", "second alias B", "third alias A", "third alias B", "fourth", "fifth"),
stringsAsFactors=FALSE)
want <- data_frame(ID=paste0("id", c(101:105)),
info1=c("one", "twoA; twoB", "threeA; threeB", "four", "five"),
info2=c("first", "second alias A; second alias B", "third alias A; third alias B", "fourth", "fifth"),
stringsAsFactors=FALSE)
Run Code Online (Sandbox Code Playgroud)
这个问题基本上问了同一个问题,但只有一个"信息"专栏.我有多个其他列,并希望为所有这些列执行此操作.
使用dplyr执行此操作的加分点.
tal*_*lat 16
这是一个使用选项summarise_each(可以很容易地将更改应用于除分组变量之外的所有列)和toString:
require(dplyr)
have %>%
group_by(ID) %>%
summarise_each(funs(toString))
#Source: local data frame [5 x 3]
#
# ID info1 info2
#1 id101 one first
#2 id102 twoA, twoB second alias A, second alias B
#3 id103 threeA, threeB third alias A, third alias B
#4 id104 four fourth
#5 id105 five fifth
Run Code Online (Sandbox Code Playgroud)
或者,如果您希望它以分号分隔,您可以使用:
have %>%
group_by(ID) %>%
summarise_each(funs(paste(., collapse = "; ")))
Run Code Online (Sandbox Code Playgroud)
小智 12
好老aggregate,这样做很好
aggregate(have[,2:3], by=list(have$ID), paste, collapse=";")
Run Code Online (Sandbox Code Playgroud)
问题是:它是否规模?
这是一个data.table解决方案.
library(data.table)
setDT(have)[, lapply(.SD, paste, collapse = "; "), by = ID]
# ID info1 info2
# 1: id101 one first
# 2: id102 twoA; twoB second alias A; second alias B
# 3: id103 threeA; threeB third alias A; third alias B
# 4: id104 four fourth
# 5: id105 five fifth
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
7455 次 |
| 最近记录: |