我有两个data.tables:samples, resources
resources与samplesvia primary和secondaryid 连接.我想首先通过主id将来自资源的信息与sample-table相结合,并且只有当它产生NA时,我才想要从同一个表(在一个data.table命令链中)中使用辅助资源.
# resources:
primary secondary info
1: 17 42 "I"
2: 18 NA "J"
3: 19 43 "K"
# samples:
name primary secondary
1: "a" 17 55
2: "b" 0 42
3 "c" 18 42
Run Code Online (Sandbox Code Playgroud)
期望的结果是:
# joined tables:
name info # primary secondary
1: "a" "I"
2: "b" "I"
3: "c" "J"
Run Code Online (Sandbox Code Playgroud)
第一个连接通道primary很容易,它产生
# Update:
samples <- data.table(name = letters[1:3],
primary = c(17, 0, 18),
secondary = c(55, 42, 42))
resources <- data.table(primary = 17:19,
secondary = c(42, NA, 43),
info = LETTERS[9:11])
# first join:
setkey(samples, primary)
setkey(resources, primary)
samples[resources]
name info # primary secondary
1: "a" "I"
2: "b" NA
3: "c" "J"
Run Code Online (Sandbox Code Playgroud)
但是之后?我需要重新关键样品setkey(samples, secondary),对吗?然后将子集仅限于那些产生NA的行.但是在一个命令链中所有这一切都不可能实现(并且假设有两个以上的标准......).我怎样才能更简洁地实现这一目标?
...使用data.tables的代码更新.
虽然你可以在一条线上完成它,但我认为这会掩盖你所做的事情的意义,让你难以理解的阅读/理解/调试/记住你在一个月内做了什么,这简直是一个坏主意.
更小,更容易消化的块是imo的方式:
setkey(samples, primary)
setkey(resources, primary)
samples[resources, info := i.info]
setkey(samples, secondary)
setkey(resources, secondary)
samples[resources, info := ifelse(is.na(info), i.info, info)]
samples
# name primary secondary info
#1: b 0 42 I
#2: c 18 42 J
#3: a 17 55 I
# keep going with tertiary and so on if you like
Run Code Online (Sandbox Code Playgroud)
正如@nachti在评论中指出的那样,您可能需要allow.cartesian=TRUE在1.9.5之前添加版本,具体取决于您的数据.