小编Eth*_*han的帖子

相对窗口运行总和通过data.table非equi连接

我有一个数据集customerId,transactionDate,productId,purchaseQty加载到data.table中.对于每一行,我想计算前45天的总和,以及购买数量的平均值

        productId customerID transactionDate purchaseQty
 1:    870826    1186951      2016-03-28      162000
 2:    870826    1244216      2016-03-31        5000
 3:    870826    1244216      2016-04-08        6500
 4:    870826    1308671      2016-03-28      221367
 5:    870826    1308671      2016-03-29       83633
 6:    870826    1308671      2016-11-29       60500

Run Code Online (Sandbox Code Playgroud)

我正在寻找这样的输出:

    productId customerID transactionDate purchaseQty    sumWindowPurchases
 1:    870826    1186951      2016-03-28      162000                162000
 2:    870826    1244216      2016-03-31        5000                  5000
 3:    870826    1244216      2016-04-08        6500                 11500
 4:    870826    1308671      2016-03-28      221367                221367
 5:    870826    1308671      2016-03-29       83633                305000
 6:    870826    1308671      2016-11-29       60500                 60500

Run Code Online (Sandbox Code Playgroud)

因此,sumWindowPurchases包含当前交易日期45天窗口内客户/产品的purchaseQty总和.一旦我有了这个工作,抛出我需要的平均值和其他计算应该是微不足道的

我回到我的SQL根源并想到了一个自我加入:

select …

Run Code Online (Sandbox Code Playgroud)

r summarization data.table

Eth*_*han

2016 12-08

6
推荐指数

1
解决办法

424
查看次数

聚合时data.table多列非等值联接的性能降低

我正在尝试查找性能问题，并将其很大程度上隔离为多列非等额联接。以下是我尝试做的事情的合理（但不是确切）示例，以及时间安排。

library(quantmod)
library(data.table)

p <- last(OHLC(getSymbols("SPY", auto.assign = F,)), 700)
d <- as.data.table(p) #convert to a data.table for processing
d[, index := as.POSIXct(index)] #to match my use case. leaving as Date does not significantly alter timings
setnames(d, c("index", "Open", "High", "Low", "Close"))

# create partitions for analysis
partitions = unique(d[d, .(Top = x.Close, Bot = i.Close, Start = pmin(x.index, i.index)),
    on = .(Close >= Close), allow.cartesian = T][!is.na(Start)])

#desired analysis
system.time(r1 <- d[partitions, .(i.Top, i.Bot, i.Start, mean(x.Close), sd(x.Close)),
    on = …

Run Code Online (Sandbox Code Playgroud)

join r multiple-columns data.table

Eth*_*han

2018 08-01

5
推荐指数

0
解决办法

188
查看次数