Cor*_*one 5 join r cartesian-product data.table
什么是进行笛卡尔连接并使用前滚特征的最佳方法,但是将滚动特征应用于连接表中的每个替代系列,而不是整个系列.
最佳解释一个例子:
library(data.table)
A = data.table(x = c(1,2,3,4,5), y = letters[1:5])
B = data.table(x = c(1,2,3,1,4), f = c("Alice","Alice","Alice", "Bob","Bob"), z = 101:105)
setkey(B,x)
C = B[A, roll = TRUE, allow.cartesian=TRUE, rollends = FALSE]
A
B
C[f == "Alice"]
C[f == "Bob"]
C
Run Code Online (Sandbox Code Playgroud)
所以我们有两个起始表:
> A
x y
1: 1 a
2: 2 b
3: 3 c
4: 4 d
5: 5 e
> B
x f z
1: 1 Alice 101
2: 1 Bob 104
3: 2 Alice 102
4: 3 Alice 103
5: 4 Bob 105
Run Code Online (Sandbox Code Playgroud)
而且我想加入这些,这样我就可以为每个 x值而且A我有两个Alice和Bob行,如果有任何一个缺失(但没有滚动到结尾),则向前滚动.这不是很有效,因为我现在得到它:
> C[f == "Alice"]
x f z y
1: 1 Alice 101 a
2: 2 Alice 102 b
3: 3 Alice 103 c
> C[f == "Bob"]
x f z y
1: 1 Bob 104 a
2: 4 Bob 105 d
> C
x f z y
1: 1 Alice 101 a
2: 1 Bob 104 a
3: 2 Alice 102 b
4: 3 Alice 103 c
5: 4 Bob 105 d
6: 5 NA NA e
Run Code Online (Sandbox Code Playgroud)
因为Alice有2和3,所以它不会向前推送Bob的数据.我需要Bob的额外行,所以我想得到:
> C[f == "Alice"]
x f z y
1: 1 Alice 101 a
2: 2 Alice 102 b
3: 3 Alice 103 c
> C[f == "Bob"]
x f z y
1: 1 Bob 104 a
2: 2 Bob 104 b # THESE ROWS ARE MISSING
3: 3 Bob 104 c # THESE ROWS ARE MISSING
4: 4 Bob 105 d
> C
x f z y
1: 1 Alice 101 a
2: 1 Bob 104 a
3: 2 Alice 102 b
4: 2 Bob 104 b # THESE ROWS ARE MISSING
5: 3 Alice 103 c
6: 3 Bob 104 c # THESE ROWS ARE MISSING
7: 4 Bob 105 d
8: 5 NA NA e
Run Code Online (Sandbox Code Playgroud)
干得好:
setkey(B, f, x)
setkey(B[CJ(unique(f), unique(x)), allow.cartesian = T,
roll = T, rollends = c(F,F)], x)[A, allow.cartesian = T]
# x f z y
#1: 1 Alice 101 a
#2: 1 Bob 104 a
#3: 2 Alice 102 b
#4: 2 Bob 104 b
#5: 3 Alice 103 c
#6: 3 Bob 104 c
#7: 4 Alice NA d
#8: 4 Bob 105 d
#9: 5 NA NA e
Run Code Online (Sandbox Code Playgroud)
您可以过滤掉NA' 以满足您的需要。
| 归档时间: |
|
| 查看次数: |
277 次 |
| 最近记录: |