考虑以下问题data.table.第一个定义了一组具有每个组'x'的起始位置和结束位置的区域:
library(data.table)
d1 <- data.table(x = letters[1:5], start = c(1,5,19,30, 7), end = c(3,11,22,39,25))
setkey(d1, x, start)
# x start end
# 1: a 1 3
# 2: b 5 11
# 3: c 19 22
# 4: d 30 39
# 5: e 7 25
Run Code Online (Sandbox Code Playgroud)
第二个数据集具有相同的分组变量"x",并在每个组中定位"pos":
d2 <- data.table(x = letters[c(1,1,2,2,3:5)], pos = c(2,3,3,12,20,52,10))
setkey(d2, x, pos)
# x pos
# 1: a 2
# 2: a 3
# 3: b 3
# 4: b 12
# …Run Code Online (Sandbox Code Playgroud) 如何使用data.table执行以下(直接使用sqldf)并得到完全相同的结果:
library(data.table)
whatWasMeasured <- data.table(start=as.POSIXct(seq(1, 1000, 100),
origin="1970-01-01 00:00:00"),
end=as.POSIXct(seq(10, 1000, 100), origin="1970-01-01 00:00:00"),
x=1:10,
y=letters[1:10])
measurments <- data.table(time=as.POSIXct(seq(1, 2000, 1),
origin="1970-01-01 00:00:00"),
temp=runif(2000, 10, 100))
## Alternative short names for data.tables
dt1 <- whatWasMeasured
dt2 <- measurments
## Straightforward with sqldf
library(sqldf)
sqldf("select * from measurments m, whatWasMeasured wwm
where m.time between wwm.start and wwm.end")
Run Code Online (Sandbox Code Playgroud)