我有一个数据框,看起来像:
df<-data.frame(id=c("xx33","xx33","xx22","xx11","xx11","xx00"),amount=c(10,15,100,20,10,15),date=c("01/02/2013","01/02/2013","02/02/2013","03/03/2013","03/03/2013","04/04/2013"))
id amount date
1 xx33 10 01/02/2013
2 xx33 15 01/02/2013
3 xx22 100 02/02/2013
4 xx11 20 03/03/2013
5 xx11 10 03/03/2013
6 xx00 15 04/04/2013
Run Code Online (Sandbox Code Playgroud)
我想编译所有公共ID并总结数量以及id的出现次数,但是还要携带公共信息,例如每个id(以及任何其他变量)的日期相同.所以,我希望输出为:
id sum date number
1 xx33 25 01/02/2013 2
2 xx22 100 02/02/2013 1
3 xx11 30 03/03/2013 2
4 xx00 15 04/04/2013 1
Run Code Online (Sandbox Code Playgroud)
我试过了
ddply(.data = df, .var = "id", .fun = nrow)
Run Code Online (Sandbox Code Playgroud)
并返回总出现次数,但我无法找到一种方法来汇总所有常见ID而不进行循环.
使用data.table
图书馆 -
library(data.table)
dt <- data.table(df)
dt2 <- dt[,list(sumamount = sum(amount), freq = .N), by = c("id","date")]
Run Code Online (Sandbox Code Playgroud)
输出:
> dt2
id date sumamount freq
1: xx33 01/02/2013 25 2
2: xx22 02/02/2013 100 1
3: xx11 03/03/2013 30 2
4: xx00 04/04/2013 15 1
Run Code Online (Sandbox Code Playgroud)
以下是使用plyr包的解决方案:
library(plyr)
ddply(df,.(date,id),summarize,sum=sum(amount),number=length(id))
date id sum number
1 01/02/2013 xx33 25 2
2 02/02/2013 xx22 100 1
3 03/03/2013 xx11 30 2
4 04/04/2013 xx00 15 1
Run Code Online (Sandbox Code Playgroud)
这是R base解决方案
> cbind(aggregate(amount~id+date, sum, data=df), table(df$id))[, -4]
id date amount Freq
1 xx33 01/02/2013 25 1
2 xx22 02/02/2013 100 2
3 xx11 03/03/2013 30 1
4 xx00 04/04/2013 15 2
Run Code Online (Sandbox Code Playgroud)
归档时间: |
|
查看次数: |
18086 次 |
最近记录: |