加入两个数据表以按日期范围覆盖值

Zac*_*ary 5 r data.table

我想根据另一个表中的覆盖更正一个表.当dt_override具有该单位且日期范围与dt_current重叠时,我想更改dt_current中的值.

dt_current <- data.table( unit = c(rep("a",10), rep("b", 10)), 
    date = seq(as.Date("2015-1-1"), by = "day", length.out = 10), 
    num = 1:10, key = "unit")
dt_override <- data.table( unit = c("a", "a", "b", "zed" ), start_date = as.Date(c("2015-01-03", "1492-12-25", "2015-01-02", "2015-01-11")), 
    end_date = as.Date(c("2015-01-05", "1492-12-26", "2015-01-04", "2015-01-14")), 
    value = NA, key = "unit")
Run Code Online (Sandbox Code Playgroud)

在加入两个数据表时,我似乎应该使用某种形式的.EACHI,编码如下所示,认为它不起作用或当然.

dt_current[dt_override, 
    num := if(i.start_date <= date & i.end_date >= date) i.value, 
    by = .EACHI]
Run Code Online (Sandbox Code Playgroud)

Dav*_*urg 6

使用foverlaps一个可以做到

dt_current[, date2 := date] # define end date
setkey(dt_current, unit, date, date2) # key by unit, start and end dates
setkey(dt_override, unit, start_date, end_date) # same
Run Code Online (Sandbox Code Playgroud)

第一个选项,创建和索引并通过引用更新

indx <- foverlaps(dt_override, dt_current, which = TRUE) # run foverlaps and get indices
dt_current[indx$yid, num := dt_override[indx$xid, value]] # adjust by reference
Run Code Online (Sandbox Code Playgroud)

另外,您可以反过来运行foverlaps并避免创建,indx但同时创建一个全新的数据集

foverlaps(dt_current, dt_override)[!is.na(start_date), num := value
                                   ][, names(dt_current), with = FALSE]
Run Code Online (Sandbox Code Playgroud)