我想了解rolling joins在data.table.最后给出了重现这一点的数据.
给出机场交易的数据表,在给定时间:
> dt
t_id airport thisTime
1: 1 a 5.1
2: 3 a 5.1
3: 2 a 6.2
Run Code Online (Sandbox Code Playgroud)
(注t_ids1和3有相同的机场和时间)
以及从机场起飞的航班查询表:
> dt_lookup
f_id airport thisTime
1: 1 a 6
2: 2 a 6
3: 1 b 7
4: 1 c 8
5: 2 d 7
6: 1 d 9
7: 2 e 8
> tables()
NAME NROW NCOL MB COLS KEY
[1,] dt 3 3 1 t_id,airport,thisTime airport,thisTime
[2,] dt_lookup 7 3 …Run Code Online (Sandbox Code Playgroud) 我希望用它data.table来提高给定函数的速度,但我不确定我是以正确的方式实现它:
数据
鉴于两个data.tables(dt和dt_lookup)
library(data.table)
set.seed(1234)
t <- seq(1,100); l <- letters; la <- letters[1:13]; lb <- letters[14:26]
n <- 10000
dt <- data.table(id=seq(1:n),
thisTime=sample(t, n, replace=TRUE),
thisLocation=sample(la,n,replace=TRUE),
finalLocation=sample(lb,n,replace=TRUE))
setkey(dt, thisLocation)
set.seed(4321)
dt_lookup <- data.table(lkpId = paste0("l-",seq(1,1000)),
lkpTime=sample(t, 10000, replace=TRUE),
lkpLocation=sample(l, 10000, replace=TRUE))
## NOTE: lkpId is purposly recycled
setkey(dt_lookup, lkpLocation)
Run Code Online (Sandbox Code Playgroud)
我有找到的函数lkpId同时包含thisLocation和finalLocation,并具有"最近" lkpTime(即最小的非负的值thisTime - lkpTime)
功能
## function to get the 'next' lkpId (i.e. …Run Code Online (Sandbox Code Playgroud) 在回答关于使用data.table包滚动连接的这个问题时,我在使用多个条件时遇到了一些奇怪的行为.
考虑以下数据集:
dt <- data.table(t_id = c(1,4,2,3,5), place = c("a","a","d","a","d"), num = c(5.1, 5.1, 6.2, 5.1, 6.2), key=c("place"))
dt_lu <- data.table(f_id = c(rep(1,4),rep(2,3)), place = c("a","b","c","d","a","d","a"), num = c(6,7,8,9,6,7,8), key=c("place"))
Run Code Online (Sandbox Code Playgroud)
当我想加入dt时dt_lu只有那些dt_lu具有相同place且dt_lu$num高于dt$num以下内容的情况:
dt_lu[dt, list(tid = i.t_id,
tnum = i.num,
fnum = num[i.num < num],
fid = f_id),
by = .EACHI]
Run Code Online (Sandbox Code Playgroud)
我得到了理想的结果:
place tid tnum fnum fid
1: a 1 5.1 6 1
2: …Run Code Online (Sandbox Code Playgroud)