sha*_*ker 2 merge join r left-join data.table
我有一个像这样的"字典"表:
dict <- data.table(
Nickname = c("Abby", "Ben", "Chris", "Dan", "Ed"),
Name = c("Abigail", "Benjamin", "Christopher", "Daniel", "Edward")
)
dict
# Nickname Name
# 1: Abby Abigail
# 2: Ben Benjamin
# 3: Chris Christopher
# 4: Dan Daniel
# 5: Ed Edward
Run Code Online (Sandbox Code Playgroud)
和这样的"数据"表:
dat <- data.table(
Friend1 = c("Abby", "Ben", "Ben", "Chris"),
Friend2 = c("Ben", "Ed", NA, "Ed"),
Friend3 = c("Ed", NA, NA, "Dan"),
Friend4 = c("Dan", NA, NA, NA)
)
dat
# Friend1 Friend2 Friend3 Friend4
# 1: Abby Ben Ed Dan
# 2: Ben Ed NA NA
# 3: Ben NA NA NA
# 4: Chris Ed Dan NA
Run Code Online (Sandbox Code Playgroud)
我想生产一个data.table
看起来像这样的东西
result <- data.table(
Friend1.Nickname = c("Abby", "Ben", "Ben", "Chris"),
Friend1.Name = c("Abigail", "Benjamin", "Benjamin", "Christopher"),
Friend2.Nickname = c("Ben", "Ed", NA, "Ed"),
Friend2.Name = c("Benjamin", "Edward", NA, "Edward"),
Friend3.Nickname = c("Ed", NA, NA, "Dan"),
Friend3.Name = c("Edward", NA, NA, "Daniel"),
Friend4.Nickname = c("Dan", NA, NA, NA),
Friend4.Name = c("Daniel", NA, NA, NA)
)
result
# sorry, word wrapping makes this too annoying to copy
Run Code Online (Sandbox Code Playgroud)
这是我想到的解决方案:
friend_vars <- paste0("Friend", 1:4)
friend_nicks <- paste0(friend_vars, ".Nickname")
friend_names <- paste0(friend_vars, ".Name")
setnames(dat, friend_vars, friend_nicks)
for (i in 1:4) {
dat[, friend_names[i] := dict$Name[match(dat[[friend_nicks[i]]], dict$Nickname)], with = FALSE]
}
Run Code Online (Sandbox Code Playgroud)
有没有更"数据表式"的方式来做到这一点?我确信这是好的和有效的,但是阅读起来很难看,而且部分来自data.table
就地分配,我觉得我并没有充分利用该软件包提供的功能.
我也不是一个非常强大的SQL用户,我对连接术语不太满意.我有一种感觉,Data.table - 多个表上的左外连接在这里很有用,但我不知道如何将它应用到我的情况.
使用data.table 1.9.5
:
for (nm in names(dat)) {
on = setattr("Nickname", 'names', nm)
dat[dict, paste0(nm, ".Name") := i.Name, on=on]
}
Run Code Online (Sandbox Code Playgroud)
我们可以使用on=
而不是设置键来加入.现在您可以使用setcolorder()
重新排序名称.
除非绝对必要,否则我会避免重塑数据.这是更新,而加入非常方便.现在有了这个on=
论点,我无法抗拒发布答案:-).