Top*_*cal 5 r dataframe data.table
我有以下data.table, dtgrouped2:
MonthNo Unique Total
1: 1 AAA 10
2: 1 BBB 0
3: 2 CCC 3
4: 2 DDD 0
5: 3 AAA 0
6: 3 BBB 35
7: 4 CCC 15
8: 4 AAA 0
9: 5 BBB 60
10: 5 CCC 0
11: 6 DDD 100
12: 6 AAA 0
Run Code Online (Sandbox Code Playgroud)
另一张桌子,dt2:
Unique1 StartDate EndDate Amount1 Amount2
1: AAA 0 1 7 0
3: AAA 1 2 5 0
2: AAA 2 4 3 2
Run Code Online (Sandbox Code Playgroud)
我想从插入Amount1和AMOUNT2 dt2到dtgrouped2基于"独特"下面的逻辑为每一行计算dtgrouped2:
filter(StartDate< MonthNo & EndDate>=MonthNo)
then MAX(EndDate)
then insert Amount1 as Amount1 and Amount2 as Amount2
Run Code Online (Sandbox Code Playgroud)
因此,您可以看到结果因行而异.这将是预期的输出:
Date MonthNo Unique Items Amounts Amount1 Amount2
Jan 1 AAA x 10 7 0
Jan 1 BBB y 2 NA NA
Feb 2 CCC x 3 NA NA
Feb 2 DDD y 15 NA NA
March 3 AAA y 20 3 2
March 3 BBB x 35 NA NA
April 4 CCC x 15 NA NA
April 4 AAA y 50 3 2
May 5 BBB x 60 NA NA
May 5 CCC y 70 NA NA
June 6 DDD x 100 NA NA
June 6 AAA y 20 NA NA
Run Code Online (Sandbox Code Playgroud)
我建议使用非equi连接组合mult = "last"(为了只捕获最近的EndDate)
dtgrouped2[, c("Amount1", "Amount2") := # Assign the below result to new columns in dtgrouped2
dt2[dtgrouped2, # join
.(Amount1, Amount2), # get the column you need
on = .(Unique1 = Unique, # join conditions
StartDate < MonthNo,
EndDate >= MonthNo),
mult = "last"]] # get always the latest EndDate
dtgrouped2
# MonthNo Unique Total Amount1 Amount2
# 1: 1 AAA 10 7 0
# 2: 1 BBB 0 NA NA
# 3: 2 CCC 3 NA NA
# 4: 2 DDD 0 NA NA
# 5: 3 AAA 0 3 2
# 6: 3 BBB 35 NA NA
# 7: 4 CCC 15 NA NA
# 8: 4 AAA 0 3 2
# 9: 5 BBB 60 NA NA
# 10: 5 CCC 0 NA NA
# 11: 6 DDD 100 NA NA
# 12: 6 AAA 0 NA NA
Run Code Online (Sandbox Code Playgroud)
您需要首先加入dt2[dtgrouped](而不是相反)的原因是因为您想要dt2为每个可能的值加入dtgrouped,因此允许将多个值dt2连接到dtgrouped