在R数据帧中,如何广播与维度对应的列?

Rac*_*ole 5 merge aggregate r dataframe

我有一个R数据帧:

# here just define it directly, but it comes from a simulation
simPrice <- data.frame(simId=c(1,1,2,2), 
                       crop=rep(c('apple','pear'),2), 
                       mean=rep(c(10,22),2), 
                       sd=rep(c(2,4),2), 
                       price=c(9,21,12,18))

    simId   crop mean sd price
  1     1  apple   10  2     9
  2     1   pear   22  4    21
  3     2  apple   10  2    12
  4     2   pear   22  4    18
Run Code Online (Sandbox Code Playgroud)

这是模拟的两次不同迭代中的水果(苹果和梨)的价格.一般来说,我可能有任何数量的水果或迭代.至关重要的是,我可能还有其他专栏(例如品种,销售日期,销售地点等).

我有另一个数据框,给出了在许多农场种植的水果量:

# here just define it directly, but it comes from a simulation
simVol  <- data.frame(simId=c(1,1,1,1,2,2,2,2), 
                      farm=rep(c('farm A', 'farm A', 'farm B', 'farm B'),2),
                      crop=rep(c('apple','pear'),4), 
                      mean=rep(c(10,22),4), 
                      sd=rep(c(2,4),4), 
                      volume=c(9,21,12,18,10,22,11,19))

  simId   farm  crop mean sd volume
1     1 farm A apple   10  2      9
2     1 farm A  pear   22  4     21
3     1 farm B apple   10  2     12
4     1 farm B  pear   22  4     18
5     2 farm A apple   10  2     10
6     2 farm A  pear   22  4     22
7     2 farm B apple   10  2     11
8     2 farm B  pear   22  4     19
Run Code Online (Sandbox Code Playgroud)

现在我想把它们加在一起.

我认为要做到这一点,我得先"广播" simPricefarm这么两个dataframes具有完全相同的顺序.

我的解决方案是:

broadcast <- function(origDf, broadcast_dimList) {
    newDimDf <- do.call(expand.grid, broadcast_dimList);
    nReps <- nrow(newDimDf);
    # replicate each line of the original dataframe in place
    result <- origDf[sort(rep(row.names(origDf), nReps)), 1:ncol(origDf)]
    # add the new dimensions, repeated for each simId
    result <- cbind(newDimDf, result);
    # rename rows sequentially
    row.names(result)<-NULL; 
    return(result);
}

bcastSimPrice <- broadcast(simPrice, list(farm=c('farm A','farm B')))

    farm simId  crop mean sd price
1 farm A     1 apple   10  2     9
2 farm B     1 apple   10  2     9
3 farm A     1  pear   22  4    21
4 farm B     1  pear   22  4    21
5 farm A     2 apple   10  2    12
6 farm B     2 apple   10  2    12
7 farm A     2  pear   22  4    18
8 farm B     2  pear   22  4    18
Run Code Online (Sandbox Code Playgroud)

这是有效的,但它让我遇到了现在试图将bcastSimPrice(农作物前增加的农场)的行与simVol(反过来)的行匹配的问题.

有没有其他方法来解决这个问题?

谢谢!

jba*_*ums 2

merge将执行您希望您的broadcast函数执行的操作。

一个简单的:

bcastSimPrice <- within(merge(simPrice, simVol), revenue <- volume * price)
Run Code Online (Sandbox Code Playgroud)

应该可以解决问题。在这里,我包装了 merge并添加了给出收入 ( x )within的列。volumeprice

然后,如果您需要对行进行分组(例如,如果给定作物和 simId 存在农场 A 的多个实例),那么您可以使用aggregate

aggregate(revenue ~ simId + crop + farm, sum, data=bcastSimPrice)
Run Code Online (Sandbox Code Playgroud)