ay_*_*_ya 3 merge aggregate r data.table
这是我的数据:
df1 <- fread('
id , date1 , date2
id_0001 , 2017-01-01, 2017-01-05
id_0002 , 2017-01-02, 2017-01-08
id_0003 , 2017-01-04, 2017-01-07
')
df2<- fread('
date , value
2017-01-01, 1
2017-01-02, 2
2017-01-03, 5
2017-01-04, 5
2017-01-05, 5
2017-01-06, 3
2017-01-07, 4
2017-01-08, 7
2017-01-09, 5
2017-01-10, 1
2017-01-11, 5
')
Run Code Online (Sandbox Code Playgroud)
我想总结(获取平均值)每个from在 rowwise和之间的范围内的valuefrom 。df2iddf1date1date2
结果是这样的:
| ID | 日期1 | 日期2 | 价值 |
|---|---|---|---|
| id_0001 | 2017-01-01 | 2017-01-05 | mean(c(1,2,5,5,5)) |
| id_0002 | 2017-01-02 | 2017-01-08 | mean(c(2,5,5,5,3,4,7)) |
| id_0003 | 2017-01-04 | 2017-01-07 | mean(c(5,5,3,4)) |
我知道我可以扩展idbydate1和date2indf1并执行left_joinby datestodf2和 then summarize。然而,随着数据量的增加,当需要进一步分析时,r无法处理一定大小的向量。有没有办法data.table进行数据帧间摘要?
如果方法不准确,那么您似乎正在采用如下方法:
library(data.table)
df1[, value := df2[.SD, on = .(date >= date1, date <= date2), mean(value), by = .EACHI]$V1]
Run Code Online (Sandbox Code Playgroud)
输出:
df1
id date1 date2 value
1: id_0001 2017-01-01 2017-01-05 3.600000
2: id_0002 2017-01-02 2017-01-08 4.428571
3: id_0003 2017-01-04 2017-01-07 4.250000
Run Code Online (Sandbox Code Playgroud)