Fel*_* D. 7 join r data.table tidyverse non-equi-join
我在运行非等连接(来自 R 的 data.table 库)时发现了奇怪的行为,并且我无法弄清楚为什么会发生这种情况。
为什么在运行非等值连接时,如果我想保留左表的原始值,我需要写入x.colname而不是只写入连接的属性colname内?j
这是我正在谈论的一个可重复的小例子:
library(tidyverse)
library(data.table)
# Setting seed for reproducibility
set.seed(666)
# data.table that contains roadway segments.
# The "frm_dfo" and "to_dfo" columns represent the start and end mileposts
# of each roadway segment. For example, the segment with road_ID=101 refers
# to the portion of IH20 that starts at milepost 10 and ends at milepost 20.
roads = data.table(road_id=101:109,
hwy=c('IH20','IH20','IH20','SH150','SH150','SH150','TX66','TX66','TX66'),
frm_dfo=c(10,20,30,10,20,30,10,20,30),
to_dfo=c(20,30,40,20,30,40,20,30,40),
seg_name=c('Seg 1','Seg 2', 'Seg 3','Seg 10','Seg 20', 'Seg 30','Seg 100','Seg 200', 'Seg 300'))
# data.table that contains crashes.
# The "dfo" column represents the milepost of the roadway on which the
# crash occurs. For example, the crash with crash_id=1 happens on milepost 33.23105 of IH20.
crashes = data.table(crash_id=1:30,
hwy=rep(c('IH20','SH150','BOB11'),each=10),
dfo=runif(min=10,max=40, n=30))
# Non-equi join that finds which segment each crash happens on.
joined_data_v1 = crashes %>%
.[roads,
j = list(crash_id, hwy, x.dfo, seg_name, frm_dfo, to_dfo),
on = list(hwy=hwy, dfo >= frm_dfo, dfo <= to_dfo)] %>%
arrange(crash_id, by_group = TRUE)
# Again, joining crashes and roadway segments.
# Here, though, note that I've swapped x.dfo for just dfo inside the `j` argument
joined_data_v2 = crashes %>%
.[roads,
j = list(crash_id, hwy, dfo, seg_name, frm_dfo, to_dfo),
on = list(hwy=hwy, dfo >= frm_dfo, dfo <= to_dfo)] %>%
arrange(crash_id, by_group = TRUE)
Run Code Online (Sandbox Code Playgroud)
这是(在参数中joined_data_v1使用)的快照:
x.dfoj
这是(在参数中joined_data_v2使用)的快照:
dfoj
请注意,在 中joined_data_v1,调用的列如何包含来自 data.table 的列的x.dfo精确值。但是,在 中,调用的列包含来自 data.table 的列的值(而不是来自data.table 的列的实际数据)。dfocrashesjoined_data_v2dfofrm_dforoadscrashesdfo
这里发生了什么?为什么这行为如此奇怪?为什么生成的 data.table 的列中包含的值并不总是准确反映 data.table的原始列dfo/x.dfo中包含的内容?dfocrashes
我尝试查看一些非等值连接的文档,但找不到任何可以帮助我的东西。
这是一个相关的相关问题,但他们没有提到为什么会发生这种行为。
| 归档时间: |
|
| 查看次数: |
304 次 |
| 最近记录: |