Data.Table按组滚动连接

Ami*_*tai 7 join r data.table

如何在test.day之前找到每个(,)对的最后一个值?loc.xloc.y

dt <- data.table( 
  loc.x = as.integer(c(1, 1, 3, 1, 3, 1)),
  loc.y = as.integer(c(1, 2, 1, 2, 1, 2)),
  time = as.IDate(c("2015-03-11", "2015-05-10", "2015-09-27",
                    "2015-11-25", "2014-09-13", "2015-08-19")), 
  value = letters[1:6]
)

setkey(dt, loc.x, loc.y, time)
test.day <- as.IDate("2015-10-01")
Run Code Online (Sandbox Code Playgroud)

所需输出:

   loc.x loc.y value
1:     1     1     a
2:     1     2     f
3:     3     1     c
Run Code Online (Sandbox Code Playgroud)

tal*_*lat 6

您可以首先将行子集在哪里time < test.day(这应该非常有效,因为它不是由组完成的),然后选择value每个组的最后一个.要做到这一点,您可以使用tail(value, 1L)或,如Floo0所示value[.N],导致:

dt[time < test.day, tail(value, 1L), by = .(loc.x, loc.y)]
#   loc.x loc.y V1
#1:     1     1  a
#2:     1     2  f
#3:     3     1  c
Run Code Online (Sandbox Code Playgroud)

要么

dt[time < test.day, value[.N], by = .(loc.x, loc.y)]
Run Code Online (Sandbox Code Playgroud)

请注意,这是有效的,因为数据是按顺序排序的setkey(dt, loc.x, loc.y, time).

  • 如果`tail(value,1L)`,你可以使用`value [.N]`. (3认同)

Jaa*_*aap 6

另一个选择是使用该last功能:

dt[, last(value[time < test.day]), by = .(loc.x, loc.y)]
Run Code Online (Sandbox Code Playgroud)

这使:

   loc.x loc.y V1
1:     1     1  a
2:     1     2  f
3:     3     1  c
Run Code Online (Sandbox Code Playgroud)

  • 我认为`dt [time <test.day,last(value),by =.(loc.x,loc.y)]`会更有效率,因为它不会按组重新计算`time <test.day` ,但这似乎与我至少的另一个答案几乎相同. (6认同)

Dav*_*urg 5

这是创建查找表后使用滚动连接的另一个选项

indx <- data.table(unique(dt[ ,.(loc.x, loc.y)]), time = test.day)  
dt[indx, roll = TRUE, on = names(indx)]
#    loc.x loc.y       time value
# 1:     1     1 2015-10-01     a
# 2:     1     2 2015-10-01     f
# 3:     3     1 2015-10-01     c
Run Code Online (Sandbox Code Playgroud)

或@eddi建议的非常相似的选项

dt[dt[, .(time = test.day), by = .(loc.x, loc.y)], roll = T, on = c('loc.x', 'loc.y', 'time')]
Run Code Online (Sandbox Code Playgroud)

或者是一个单独的班轮,其效率将会低于[.data.table小组

dt[, 
    .SD[data.table(test.day), value, roll = TRUE, on = c(time = "test.day")], 
    by = .(loc.x, loc.y)
  ]
#    loc.x loc.y V1
# 1:     1     1  a
# 2:     1     2  f
# 3:     3     1  c
Run Code Online (Sandbox Code Playgroud)