civ*_*ivy 0 r dataframe data.table
我想foverlaps在一个值位于重叠范围之间的设置中,使用从单独列中获取的最大数量的范围ID .虽然我对包的基本设置非常熟悉,但我找不到执行上述功能的方法.
这是一个小例子
>df1
AthleteID Distance
Athlete1 5
Athlete2 10
Athlete3 25
>df2
CheckpointID Start End Score
Checkpoint1 1 8 2
Checkpoint2 7 12 4
Checkpoint3 9 15 6
Checkpoint4 16 26 8
Checkpoint5 20 30 10
Run Code Online (Sandbox Code Playgroud)
根据以上内容,最终的data.frame应如下所示
>df1
AthleteID Distance Score CheckpointID
Athlete1 5 2 Checkpoint1
Athlete2 10 6 Checkpoint3
Athlete3 25 10 Checkpoint5
Run Code Online (Sandbox Code Playgroud)
=========================
编辑
最后一个问题; 我也有兴趣了解如何根据运动员ID使用不同的检查点分数(相同的间隔).这是一个修改过的分数表
>df2
CheckpointID AthleteID Start End Score
Checkpoint1 Athlete1 1 8 2
Checkpoint2 Athlete1 7 12 4
Checkpoint3 Athlete1 9 15 6
Checkpoint4 Athlete1 16 26 8
Checkpoint5 Athlete1 20 30 10
Checkpoint1 Athlete2 1 8 3
Checkpoint2 Athlete2 7 12 5
Checkpoint3 Athlete2 9 15 7
Checkpoint4 Athlete2 16 26 9
Checkpoint5 Athlete2 20 30 11
Checkpoint1 Athlete3 1 8 1
Checkpoint2 Athlete3 7 12 3
Checkpoint3 Athlete3 9 15 5
Checkpoint4 Athlete3 16 26 7
Checkpoint5 Athlete3 20 30 11
Run Code Online (Sandbox Code Playgroud)
所以最后的结果是这样的
>df1
AthleteID Distance Score CheckpointID
Athlete1 5 2 Checkpoint1
Athlete2 10 7 Checkpoint3
Athlete3 25 11 Checkpoint5
Run Code Online (Sandbox Code Playgroud)
您也可以使用新实现的方法来完成 non-equi连接来这应该更直接...
y[x, on = .(Start <= Distance, End >= Distance), mult = "last",
.(AthleteID, Distance, Score, CheckpointID)]
Run Code Online (Sandbox Code Playgroud)
哪里,
x=fread("AthleteID Distance
Athlete1 5
Athlete2 10
Athlete3 25
")
y=fread("CheckpointID Start End Score
Checkpoint1 1 8 2
Checkpoint2 7 12 4
Checkpoint3 9 15 6
Checkpoint4 16 26 8
Checkpoint5 20 30 10
")
Run Code Online (Sandbox Code Playgroud)