R data.table检查另一个data.table中是否存在行

use*_*285 0 r data.table

我有两个data.table像这样:

tests

id | test | score
=================
 1 |    1 |    90
 1 |    2 |   100
 2 |    1 |    70
 2 |    2 |    80
 3 |    1 |   100
 3 |    2 |    95

cheaters

id | test | score
=================
 1 |    2 |   100
 3 |    1 |   100
 3 |    2 |    95
Run Code Online (Sandbox Code Playgroud)

假设我现在想在all_scores中包含一个布尔列来判断该特定测试是否被欺骗,因此输出将如下所示:

tests

id | test | score | cheat
=========================
 1 |    1 |    90 | FALSE
 1 |    2 |   100 |  TRUE
 2 |    1 |    70 | FALSE
 2 |    2 |    80 | FALSE
 3 |    1 |   100 |  TRUE
 3 |    2 |    95 |  TRUE
Run Code Online (Sandbox Code Playgroud)

是否有捷径可寻?表格是关键idtest.

Psi*_*dom 7

创建cheat初始值为的列FALSE,然后与作弊者一起加入,并在匹配时更新cheatTRUE:

library(data.table)
setkey(setDT(tests), id, test)
setkey(setDT(cheaters), id, test)

tests[, cheat := FALSE][cheaters, cheat := TRUE]

tests
#   id test score cheat
#1:  1    1    90 FALSE
#2:  1    2   100  TRUE
#3:  2    1    70 FALSE
#4:  2    2    80 FALSE
#5:  3    1   100  TRUE
#6:  3    2    95  TRUE
Run Code Online (Sandbox Code Playgroud)

或者不设置密钥,使用on参数指定要加入的列:

setDT(tests)
setDT(cheaters)
tests[, cheat := FALSE][cheaters, cheat := TRUE, on = .(id, test)]

tests
#   id test score cheat
#1:  1    1    90 FALSE
#2:  1    2   100  TRUE
#3:  2    1    70 FALSE
#4:  2    2    80 FALSE
#5:  3    1   100  TRUE
#6:  3    2    95  TRUE
Run Code Online (Sandbox Code Playgroud)