数据框查找值在范围内并返回不同的列

Sem*_*ant 4 merge r multiple-columns dataframe

我有两个数据帧,并希望使用one(DF1$pos)中的值来搜索DF2中的两列(DF2start,DF2end),如果它在这些数字范围内,则返回DF2$name

DF1

ID   pos  name
chr   12
chr  542
chr  674
Run Code Online (Sandbox Code Playgroud)

DF2

ID   start   end   annot
chr      1   200      a1
chr    201   432      a2
chr    540  1002      a3
chr   2000  2004      a4
Run Code Online (Sandbox Code Playgroud)

所以在这个例子中我希望DF1成为

ID   pos  name
chr   12    a1
chr  542    a3
chr  674    a3
Run Code Online (Sandbox Code Playgroud)

我尝试过使用merge和intersect但不知道如何使用if带有逻辑表达式的语句.

数据帧应编码如下,

DF1  <- data.frame(ID=c("chr","chr","chr"),
               pos=c(12,542,672),
               name=c(NA,NA,NA))

DF2  <- data.frame(ID=c("chr","chr","chr","chr"),
               start=c(1,201,540,200),
               end=c(200,432,1002,2004),
               annot=c("a1","a2","a3","a4"))
Run Code Online (Sandbox Code Playgroud)

A5C*_*2T1 5

也许你可以使用foverlaps"data.table"包.

library(data.table)
DT1 <- data.table(DF1)
DT2 <- data.table(DF2)
setkey(DT2, ID, start, end)
DT1[, c("start", "end") := pos]  ## I don't know if there's a way around this step...
foverlaps(DT1, DT2)
#     ID start  end annot pos i.start i.end
# 1: chr     1  200    a1  12      12    12
# 2: chr   540 1002    a3 542     542   542
# 3: chr   540 1002    a3 674     674   674
foverlaps(DT1, DT2)[, c("ID", "pos", "annot"), with = FALSE]
#     ID pos annot
# 1: chr  12    a1
# 2: chr 542    a3
# 3: chr 674    a3
Run Code Online (Sandbox Code Playgroud)

正如@Arun在评论中所提到的,您还可以使用which = TRUEin foverlaps来提取相关值:

foverlaps(DT1, DT2, which = TRUE)
#    xid yid
# 1:   1   1
# 2:   2   3
# 3:   3   3
DT2$annot[foverlaps(DT1, DT2, which = TRUE)$yid]
# [1] "a1" "a3" "a3"
Run Code Online (Sandbox Code Playgroud)