我在运行非等连接(来自 R 的 data.table 库)时发现了奇怪的行为,并且我无法弄清楚为什么会发生这种情况。
为什么在运行非等值连接时,如果我想保留左表的原始值,我需要写入x.colname而不是只写入连接的属性colname内?j
这是我正在谈论的一个可重复的小例子:
library(tidyverse)
library(data.table)
# Setting seed for reproducibility
set.seed(666)
# data.table that contains roadway segments.
# The "frm_dfo" and "to_dfo" columns represent the start and end mileposts
# of each roadway segment. For example, the segment with road_ID=101 refers
# to the portion of IH20 that starts at milepost 10 and ends at milepost 20.
roads = data.table(road_id=101:109,
hwy=c('IH20','IH20','IH20','SH150','SH150','SH150','TX66','TX66','TX66'),
frm_dfo=c(10,20,30,10,20,30,10,20,30),
to_dfo=c(20,30,40,20,30,40,20,30,40),
seg_name=c('Seg 1','Seg 2', 'Seg 3','Seg 10','Seg 20', 'Seg …Run Code Online (Sandbox Code Playgroud) 我正在与我合作,data.table我想做一个非平等的左加入/合并。
我有一张包含汽车价格的表格和另一张表格来确定每辆车属于哪个汽车类别:
data_priceclass <- data.table()
data_priceclass$price_from <- c(0, 0, 200000, 250000, 300000, 350000, 425000, 500000, 600000, 700000, 800000, 900000, 1000000, 1100000, 1200000, 1300000, 1400000, 1500000, 1600000, 1700000, 1800000)
data_priceclass$price_to <- c(199999, 199999, 249999, 299999, 349999, 424999, 499999, 599999, 699999, 799999, 899999, 999999, 1099999, 1199999, 1299999, 1399999, 1499999, 1599999, 1699999, 1799999, 1899999)
data_priceclass$price_class <- c(1:20, 99)
Run Code Online (Sandbox Code Playgroud)
我使用非对等连接来合并两个表。但是 x[y]-join 语法data.table会删除重复项。
cars <- data.table(car_price = c(190000, 500000))
cars[data_priceclass, on = c("car_price >= price_from",
"car_price < price_to"),
price_class := i.price_class,] …Run Code Online (Sandbox Code Playgroud) 我正在尝试进行非等值连接data.table并提取该连接中连接值的最小值/最大值。
set.seed(42)
dtA <- data.table(id=rep(c("A","B"),each=3), start=rep(1:3, times=2), end=rep(2:4, times=2))
dtB <- data.table(id=rep(c("A","B"),times=20), time=sort(runif(40, 1, 4)))
Run Code Online (Sandbox Code Playgroud)
time我想保留介于start和end(以及 on )之间的最小/最大值id。名义上,这只是一个非等值连接,但我找不到by=.EACHI或的组合mult="..."来获得我想要的东西。相反,最小值/最大值通常与我需要的范围不一致。不幸的roll=是不支持非等值范围。
dtA[dtB, c("Min", "Max") := .(min(time), max(time)),
on = .(id, start <= time, end > time), mult = "first"]
# id start end Min Max
# <char> <int> <int> <num> <num>
# 1: A 1 2 1.011845 3.966675
# 2: A 2 3 1.011845 3.966675
# 3: A …Run Code Online (Sandbox Code Playgroud)