djh*_*rio 5 aggregate r weighted data.table
我正在寻找一个解决方案,用data.table计算一些变量的加权和.我希望这个例子足够清楚.
require(data.table)
dt <- data.table(matrix(1:200, nrow = 10))
dt[, gr := c(rep(1,5), rep(2,5))]
dt[, w := 2]
# Error: object 'w' not found
dt[, lapply(.SD, function(x) sum(x * w)),
.SDcols = paste0("V", 1:4)]
# Error: object 'w' not found
dt[, lapply(.SD * w, sum),
.SDcols = paste0("V", 1:4)]
# This works with out groups
dt[, lapply(.SD, function(x) sum(x * dt$w)),
.SDcols = paste0("V", 1:4)]
# It does not work by groups
dt[, lapply(.SD, function(x) sum(x * dt$w)),
.SDcols = paste0("V", 1:4), keyby = gr]
# The result to be expected
dt[, list(V1 = sum(V1 * w),
V2 = sum(V2 * w),
V3 = sum(V3 * w),
V4 = sum(V4 * w)), keyby = gr]
### from Aruns answer
dt[, lapply(.SD[, paste0("V", 1:4), with = F],
function(x) sum(x*w)), by=gr]
Run Code Online (Sandbox Code Playgroud)
复制@Roland的优秀答案:
print(dt[, lapply(.SD, function(x, w) sum(x*w), w=w), by=gr][, w := NULL])
Run Code Online (Sandbox Code Playgroud)
按照@ Roland的评论,对所有列进行操作确实更快,然后只删除不需要的列(只要操作本身不耗时,这就是这里的情况).
dt[, {lapply(.SD, function(x) sum(x*w))}, by=gr][, w := NULL][]
Run Code Online (Sandbox Code Playgroud)
出于某种原因,w
当我不使用时似乎找不到{}
..不知道为什么.
(如果组太多,子集可能会很昂贵)
您可以在不使用的情况下执行此操作.SDcols
,然后在提供时将其删除lapply
,如下所示:
dt[, lapply(.SD[, -1, with=FALSE], function(x) sum(x*w)), by=gr]
# gr V1 V2 V3 V4
# 1: 1 20 120 220 320
# 2: 2 70 170 270 370
Run Code Online (Sandbox Code Playgroud)
.SDcols
使得.SD
没有在w
列.因此,它不可能成倍增加,w
因为它在.SD环境的范围内不存在.