sds*_*sds 2 lookup join r data.table
我试图使用数据表作为查找表:
> (dt <- data.table(myid=rep(11:12,3),zz=1:6,key=c("myid","zz")))
myid zz
1: 11 1
2: 11 3
3: 11 5
4: 12 2
5: 12 4
6: 12 6
> (id2name <- data.table(id=11:14,name=letters[1:4],key="id"))
id name
1: 11 a
2: 12 b
3: 13 c
4: 14 d
Run Code Online (Sandbox Code Playgroud)
我想要的是
> (res <- data.table(myid=rep(11:12,3),zz=1:6,name=rep(letters[1:2],3),key=c("myid","zz")))
myid zz name
1: 11 1 a
2: 11 3 a
3: 11 5 a
4: 12 2 b
5: 12 4 b
6: 12 6 b
Run Code Online (Sandbox Code Playgroud)
但是,我试过的连接失败了:
> dt[id2name]
Starting binary search ...done in 0 secs
Error in vecseq(f__, len__, if (allow.cartesian) NULL else as.integer(max(nrow(x), :
Join results in 8 rows; more than 6 = max(nrow(x),nrow(i)). Check for duplicate key values in i, each of which join to the same group in x over and over again. If that's ok, try including `j` and dropping `by` (by-without-by) so that j runs for each group to avoid the large allocation. If you are sure you wish to proceed, rerun with allow.cartesian=TRUE. Otherwise, please search for this error message in the FAQ, Wiki, Stack Overflow and datatable-help for advice.
Calls: [ -> [.data.table -> vecseq
Run Code Online (Sandbox Code Playgroud)
我究竟做错了什么?
PS.我可以通过任何其他方式获得结果; 什么是最惯用的方式来做我想要的(dt必须仍然是一个data.table,但id2name可以是任何映射int到其他东西 - 只要int不被假定为矢量索引).
> dt[id2name, allow.cartesian=T, nomatch=0]
myid zz name
1: 11 1 a
2: 11 3 a
3: 11 5 a
4: 12 2 b
5: 12 4 b
6: 12 6 b
Run Code Online (Sandbox Code Playgroud)
data.table我试图拯救你自己,以防你有重复值的无意连接.请注意,如果您确定知道自己在做什么,则错误消息(最终)会告诉您该怎么做.
或者:
> id2name[dt]
id name zz
1: 11 a 1
2: 11 a 3
3: 11 a 5
4: 12 b 2
5: 12 b 4
6: 12 b 6
Run Code Online (Sandbox Code Playgroud)