Rac*_*ole 5 merge aggregate r dataframe
我有一个R数据帧:
# here just define it directly, but it comes from a simulation
simPrice <- data.frame(simId=c(1,1,2,2),
crop=rep(c('apple','pear'),2),
mean=rep(c(10,22),2),
sd=rep(c(2,4),2),
price=c(9,21,12,18))
simId crop mean sd price
1 1 apple 10 2 9
2 1 pear 22 4 21
3 2 apple 10 2 12
4 2 pear 22 4 18
Run Code Online (Sandbox Code Playgroud)
这是模拟的两次不同迭代中的水果(苹果和梨)的价格.一般来说,我可能有任何数量的水果或迭代.至关重要的是,我可能还有其他专栏(例如品种,销售日期,销售地点等).
我有另一个数据框,给出了在许多农场种植的水果量:
# here just define it directly, but it comes from a simulation
simVol <- data.frame(simId=c(1,1,1,1,2,2,2,2),
farm=rep(c('farm A', 'farm A', 'farm B', 'farm B'),2),
crop=rep(c('apple','pear'),4),
mean=rep(c(10,22),4),
sd=rep(c(2,4),4),
volume=c(9,21,12,18,10,22,11,19))
simId farm crop mean sd volume
1 1 farm A apple 10 2 9
2 1 farm A pear 22 4 21
3 1 farm B apple 10 2 12
4 1 farm B pear 22 4 18
5 2 farm A apple 10 2 10
6 2 farm A pear 22 4 22
7 2 farm B apple 10 2 11
8 2 farm B pear 22 4 19
Run Code Online (Sandbox Code Playgroud)
现在我想把它们加在一起.
我认为要做到这一点,我得先"广播" simPrice在farm这么两个dataframes具有完全相同的顺序.
我的解决方案是:
broadcast <- function(origDf, broadcast_dimList) {
newDimDf <- do.call(expand.grid, broadcast_dimList);
nReps <- nrow(newDimDf);
# replicate each line of the original dataframe in place
result <- origDf[sort(rep(row.names(origDf), nReps)), 1:ncol(origDf)]
# add the new dimensions, repeated for each simId
result <- cbind(newDimDf, result);
# rename rows sequentially
row.names(result)<-NULL;
return(result);
}
bcastSimPrice <- broadcast(simPrice, list(farm=c('farm A','farm B')))
farm simId crop mean sd price
1 farm A 1 apple 10 2 9
2 farm B 1 apple 10 2 9
3 farm A 1 pear 22 4 21
4 farm B 1 pear 22 4 21
5 farm A 2 apple 10 2 12
6 farm B 2 apple 10 2 12
7 farm A 2 pear 22 4 18
8 farm B 2 pear 22 4 18
Run Code Online (Sandbox Code Playgroud)
这是有效的,但它让我遇到了现在试图将bcastSimPrice(农作物前增加的农场)的行与simVol(反过来)的行匹配的问题.
有没有其他方法来解决这个问题?
谢谢!
merge将执行您希望您的broadcast函数执行的操作。
一个简单的:
bcastSimPrice <- within(merge(simPrice, simVol), revenue <- volume * price)
Run Code Online (Sandbox Code Playgroud)
应该可以解决问题。在这里,我包装了 merge并添加了给出收入 ( x )within的列。volumeprice
然后,如果您需要对行进行分组(例如,如果给定作物和 simId 存在农场 A 的多个实例),那么您可以使用aggregate:
aggregate(revenue ~ simId + crop + farm, sum, data=bcastSimPrice)
Run Code Online (Sandbox Code Playgroud)