这种加入/合并的"数据表"方式是什么?

sha*_*ker 2 merge join r left-join data.table

我有一个像这样的"字典"表:

dict <- data.table(
  Nickname = c("Abby", "Ben", "Chris", "Dan", "Ed"),
  Name = c("Abigail", "Benjamin", "Christopher", "Daniel", "Edward")
)
dict
#    Nickname        Name
# 1:     Abby     Abigail
# 2:      Ben    Benjamin
# 3:    Chris Christopher
# 4:      Dan      Daniel
# 5:       Ed      Edward
Run Code Online (Sandbox Code Playgroud)

和这样的"数据"表:

dat <- data.table(
  Friend1 = c("Abby", "Ben", "Ben", "Chris"),
  Friend2 = c("Ben", "Ed", NA, "Ed"),
  Friend3 = c("Ed", NA, NA, "Dan"),
  Friend4 = c("Dan", NA, NA, NA)
)
dat
#    Friend1 Friend2 Friend3 Friend4
# 1:    Abby     Ben      Ed     Dan
# 2:     Ben      Ed      NA      NA
# 3:     Ben      NA      NA      NA
# 4:   Chris      Ed     Dan      NA
Run Code Online (Sandbox Code Playgroud)

我想生产一个data.table看起来像这样的东西

result <- data.table(
  Friend1.Nickname = c("Abby", "Ben", "Ben", "Chris"),
  Friend1.Name = c("Abigail", "Benjamin", "Benjamin", "Christopher"),
  Friend2.Nickname = c("Ben", "Ed", NA, "Ed"),
  Friend2.Name = c("Benjamin", "Edward", NA, "Edward"),
  Friend3.Nickname = c("Ed", NA, NA, "Dan"),
  Friend3.Name = c("Edward", NA, NA, "Daniel"),
  Friend4.Nickname = c("Dan", NA, NA, NA),
  Friend4.Name = c("Daniel", NA, NA, NA)
)
result
# sorry, word wrapping makes this too annoying to copy
Run Code Online (Sandbox Code Playgroud)

这是我想到的解决方案:

friend_vars <- paste0("Friend", 1:4)
friend_nicks <- paste0(friend_vars, ".Nickname")
friend_names <- paste0(friend_vars, ".Name")
setnames(dat, friend_vars, friend_nicks)
for (i in 1:4) {
  dat[, friend_names[i] := dict$Name[match(dat[[friend_nicks[i]]], dict$Nickname)], with = FALSE]
}
Run Code Online (Sandbox Code Playgroud)

有没有更"数据表式"的方式来做到这一点?我确信这是好的和有效的,但是阅读起来很难看,而且部分来自data.table就地分配,我觉得我并没有充分利用该软件包提供的功能.

我也不是一个非常强大的SQL用户,我对连接术语不太满意.我有一种感觉,Data.table - 多个表上的左外连接在这里很有用,但我不知道如何将它应用到我的情况.

Aru*_*run 6

使用data.table 1.9.5:

for (nm in names(dat)) {
    on = setattr("Nickname", 'names', nm)
    dat[dict, paste0(nm, ".Name") := i.Name, on=on]
}
Run Code Online (Sandbox Code Playgroud)

我们可以使用on=而不是设置键来加入.现在您可以使用setcolorder()重新排序名称.

除非绝对必要,否则我会避免重塑数据.这是更新,而加入非常方便.现在有了这个on=论点,我无法抗拒发布答案:-).

  • @ssdecontrol见[#1038](https://github.com/Rdatatable/data.table/issues/1038).您可以自由发布PR.我没有计划很快就开始工作. (2认同)