lll*_*lll 1 merge r data-manipulation
假设我有两个数据框,如下所示:
n = c(2, 3, 5, 5, 6, 7)
s = c("aa", "bb", "cc", "dd", "ee", "ff")
b = c(2, 4, 5, 4, 3, 2)
df = data.frame(n, s, b)
# n s b
#1 2 aa 2
#2 3 bb 4
#3 5 cc 5
#4 5 dd 4
#5 6 ee 3
#6 7 ff 2
n2 = c(5, 6, 7, 6)
s2 = c("aa", "bb", "cc", "ll")
b2 = c("hh", "nn", "ff", "dd")
df2 = data.frame(n2, s2, b2)
# n2 s2 b2
#1 5 aa hh
#2 6 bb nn
#3 7 cc ff
#4 6 ll dd
Run Code Online (Sandbox Code Playgroud)
我想将它们合并以获得以下结果:
#n s b n2 s2 b2
#2 aa 2 5 aa hh
#3 bb 4 6 bb nn
#5 cc 5 7 cc ff
#5 dd 4 6 ll dd
Run Code Online (Sandbox Code Playgroud)
基本上,我想要实现的是,只要在 data2 的 s2 或 b2 列中找到第一个数据 s 中的值,就合并两个数据帧。
我知道当我从每个数据帧指定两列时合并可以工作,但我不确定如何在合并函数中添加 OR 条件。或者如何使用 dpylr 等软件包中的其他命令来实现此目标。
另外,需要澄清的是,会出现 s2 和 b2 与同一行中的 s 列匹配的情况。如果是这种情况,那么只需将它们合并一次即可。
如果你熟悉 SQL,你可以使用它:
library(sqldf)
res <- sqldf("SELECT l.*, r.*
FROM df as l
INNER JOIN df2 as r
on l.s = r.s2 OR l.s = r.b2")
res
n s b n2 s2 b2
1 2 aa 2 5 aa hh
2 3 bb 4 6 bb nn
3 5 cc 5 7 cc ff
4 5 dd 4 6 ll dd
5 7 ff 2 7 cc ff
Run Code Online (Sandbox Code Playgroud)
数据:
df<-structure(list(n = c(2, 3, 5, 5, 6, 7), s = structure(1:6, .Label = c("aa",
"bb", "cc", "dd", "ee", "ff"), class = "factor"), b = c(2, 4,
5, 4, 3, 2)), .Names = c("n", "s", "b"), row.names = c(NA, -6L
), class = "data.frame")
df2<-structure(list(n2 = c(5, 6, 7, 6), s2 = structure(1:4, .Label = c("aa",
"bb", "cc", "ll"), class = "factor"), b2 = structure(c(3L, 4L,
2L, 1L), .Label = c("dd", "ff", "hh", "nn"), class = "factor")), .Names = c("n2",
"s2", "b2"), row.names = c(NA, -4L), class = "data.frame")
Run Code Online (Sandbox Code Playgroud)