非等值连接不保留原始列值

Fel*_* D. 7 join r data.table tidyverse non-equi-join

我在运行非等连接(来自 R 的 data.table 库)时发现了奇怪的行为,并且我无法弄清楚为什么会发生这种情况。

为什么在运行非等值连接时,如果我想保留左表的原始值,我需要写入x.colname而不是只写入连接的属性colname内?j

这是我正在谈论的一个可重复的小例子:

library(tidyverse)
library(data.table)

# Setting seed for reproducibility
set.seed(666)

# data.table that contains roadway segments.
# The "frm_dfo" and "to_dfo" columns represent the start and end mileposts 
# of each roadway segment. For example, the segment with road_ID=101 refers 
# to the portion of IH20 that starts at milepost 10 and ends at milepost 20.
roads = data.table(road_id=101:109,
                   hwy=c('IH20','IH20','IH20','SH150','SH150','SH150','TX66','TX66','TX66'),
                   frm_dfo=c(10,20,30,10,20,30,10,20,30),
                   to_dfo=c(20,30,40,20,30,40,20,30,40),
                   seg_name=c('Seg 1','Seg 2', 'Seg 3','Seg 10','Seg 20', 'Seg 30','Seg 100','Seg 200', 'Seg 300'))

# data.table that contains crashes. 
# The "dfo" column represents the milepost of the roadway on which the 
# crash occurs. For example, the crash with crash_id=1 happens on milepost 33.23105 of IH20.
crashes = data.table(crash_id=1:30,
                     hwy=rep(c('IH20','SH150','BOB11'),each=10),
                     dfo=runif(min=10,max=40, n=30))

# Non-equi join that finds which segment each crash happens on.
joined_data_v1 = crashes %>%
                  .[roads, 
                    j  = list(crash_id, hwy, x.dfo, seg_name, frm_dfo, to_dfo),
                    on = list(hwy=hwy, dfo >= frm_dfo, dfo <= to_dfo)] %>%
                  arrange(crash_id, by_group = TRUE)

# Again, joining crashes and roadway segments. 
# Here, though, note that I've swapped x.dfo for just dfo inside the `j` argument 
joined_data_v2 = crashes %>%
                  .[roads, 
                    j  = list(crash_id, hwy, dfo, seg_name, frm_dfo, to_dfo),
                    on = list(hwy=hwy, dfo >= frm_dfo, dfo <= to_dfo)] %>%
                  arrange(crash_id, by_group = TRUE)
Run Code Online (Sandbox Code Playgroud)

这是(在参数中joined_data_v1使用)的快照: x.dfoj加入_数据_v1

这是(在参数中joined_data_v2使用)的快照: dfoj加入数据v2

请注意,在 中joined_data_v1,调用的列如何包含来自 data.table 的列的x.dfo精确值。但是,在 中,调用的列包含来自 data.table 的列的值(而不是来自data.table 的列的实际数据)。dfocrashesjoined_data_v2dfofrm_dforoadscrashesdfo

这里发生了什么?为什么这行为如此奇怪?为什么生成的 data.table 的列中包含的值并不总是准确反映 data.table的原始列dfo/x.dfo中包含的内容?dfocrashes

我尝试查看一些非等值连接的文档,但找不到任何可以帮助我的东西。

是一个相关的相关问题,但他们没有提到为什么会发生这种行为。