在回答关于使用data.table包滚动连接的这个问题时,我在使用多个条件时遇到了一些奇怪的行为.
考虑以下数据集:
dt <- data.table(t_id = c(1,4,2,3,5), place = c("a","a","d","a","d"), num = c(5.1, 5.1, 6.2, 5.1, 6.2), key=c("place"))
dt_lu <- data.table(f_id = c(rep(1,4),rep(2,3)), place = c("a","b","c","d","a","d","a"), num = c(6,7,8,9,6,7,8), key=c("place"))
Run Code Online (Sandbox Code Playgroud)
当我想加入dt时dt_lu只有那些dt_lu具有相同place且dt_lu$num高于dt$num以下内容的情况:
dt_lu[dt, list(tid = i.t_id,
tnum = i.num,
fnum = num[i.num < num],
fid = f_id),
by = .EACHI]
Run Code Online (Sandbox Code Playgroud)
我得到了理想的结果:
place tid tnum fnum fid
1: a 1 5.1 6 1
2: a 1 5.1 6 2
3: a 1 5.1 8 2
4: a 4 5.1 6 1
5: a 4 5.1 6 2
6: a 4 5.1 8 2
7: a 3 5.1 6 1
8: a 3 5.1 6 2
9: a 3 5.1 8 2
10: d 2 6.2 9 1
11: d 2 6.2 7 2
12: d 5 6.2 9 1
13: d 5 6.2 7 2
Run Code Online (Sandbox Code Playgroud)
当我想添加一个附加条件时,我可以通过链接其他条件轻松获得所需的结果,如下所示:
dt_lu[dt, list(tid = i.t_id,
tnum = i.num,
fnum = num[i.num < num],
fid = f_id),
by = .EACHI][fnum - tnum < 2]
Run Code Online (Sandbox Code Playgroud)
这给了我:
place tid tnum fnum fid
1: a 1 5.1 6 1
2: a 1 5.1 6 2
3: a 4 5.1 6 1
4: a 4 5.1 6 2
5: a 3 5.1 6 1
6: a 3 5.1 6 2
7: d 2 6.2 7 2
8: d 5 6.2 7 2
Run Code Online (Sandbox Code Playgroud)
但是,当我添加额外条件(即:差异必须小于2)时,如下所示:
dt_lu[dt, list(tid = i.t_id,
tnum = i.num,
fnum = num[i.num < num & num - i.num < 2],
fid = f_id),
by = .EACHI]
Run Code Online (Sandbox Code Playgroud)
我没有得到预期的结果:
place tid tnum fnum fid
1: a 1 5.1 6 1
2: a 1 5.1 6 2
3: a 1 5.1 6 2
4: a 4 5.1 6 1
5: a 4 5.1 6 2
6: a 4 5.1 6 2
7: a 3 5.1 6 1
8: a 3 5.1 6 2
9: a 3 5.1 6 2
10: d 2 6.2 7 1
11: d 2 6.2 7 2
12: d 5 6.2 7 1
13: d 5 6.2 7 2
Run Code Online (Sandbox Code Playgroud)
此外,我收到以下警告信息:
警告消息:在
[.data.table(dt_lu,dt,list(tid = i.t_id,tnum = i.num,fnum = num [i.num <:组1的结果的第3列是长度2,但此结果中最长的列是3.回收剩余的1个项目.此警告仅针对此问题的第一组.
预期结果将是:
place tid tnum fnum fid
1: a 1 5.1 6 1
2: a 1 5.1 6 2
4: a 4 5.1 6 1
5: a 4 5.1 6 2
7: a 3 5.1 6 1
8: a 3 5.1 6 2
11: d 2 6.2 7 2
13: d 5 6.2 7 2
Run Code Online (Sandbox Code Playgroud)
我故意保留第一个示例中的rownumbers,以显示最终结果中必须保留哪些行(与工作解决方案相同).
由于这个答案表明,它应该是可以使用连接操作中多个条件.
我尝试了以下替代方案,但它们都不起作用:
dt_lu[dt, list(tid = i.t_id,
tnum = i.num,
fnum = num[(i.num < num) & (num - i.num < 2)],
fid = f_id),
by = .EACHI]
dt_lu[dt, {
val = num[(i.num < num) & (num - i.num < 2)];
list(tid = i.t_id,
tnum = i.num,
fnum = val,
fid = f_id)},
by = .EACHI]
Run Code Online (Sandbox Code Playgroud)
有人可以解释一下为什么我没有在连接操作中获得多个条件的预期结果吗?
警告消息可以解决问题.此外,使用print()在这里非常有帮助.
dt_lu[dt, print(i.num < num & num - i.num < 2), by=.EACHI]
# [1] TRUE TRUE FALSE
# [1] TRUE TRUE FALSE
# [1] TRUE TRUE FALSE
# [1] FALSE TRUE
# [1] FALSE TRUE
# Empty data.table (0 rows) of 3 cols: place,place,num
Run Code Online (Sandbox Code Playgroud)
考虑条件评估的第一种情况TRUE, TRUE, FALSE.该组有3个观察结果.你的j-expression包含:
.(tid = i.t_id,
tnum = i.num,
fnum = num[i.num < num & num - i.num < 2],
fid = f_id)
Run Code Online (Sandbox Code Playgroud)
i.t_id并且i.num长度为1(因为它们来自dt).但是num[..condn..]返回长度= 2,而f_id返回长度= 3.长度= 1和长度= 2的项目将被循环到最长项目/向量= 3的长度.这导致不正确的结果.由于3不能完全被2整除,因此它会返回警告.
你打算做的是:
.(tid = i.t_id,
tnum = i.num,
fnum = num[i.num < num & num - i.num < 2],
fid = f_id[i.num < num & num - i.num < 2])
Run Code Online (Sandbox Code Playgroud)
或等效地:
{
idx = i.num < num & num - i.num < 2
.(tid = i.t_id, tnum = i.num, fnum = num[idx], fid = f_id[idx])
}
Run Code Online (Sandbox Code Playgroud)
把它放在一起:
dt_lu[dt,
{
idx = i.num < num & num - i.num < 2
.(tid = i.t_id, tnum = i.num, fnum = num[idx], fid = f_id[idx])
},
by = .EACHI]
# place tid tnum fnum fid
# 1: a 1 5.1 6 1
# 2: a 1 5.1 6 2
# 3: a 4 5.1 6 1
# 4: a 4 5.1 6 2
# 5: a 3 5.1 6 1
# 6: a 3 5.1 6 2
# 7: d 2 6.2 7 2
# 8: d 5 6.2 7 2
Run Code Online (Sandbox Code Playgroud)