cap*_*oma 0 merge r dplyr tidyr
我有两个dataframes a
和b
我想结合
a <- data.frame(g=c("1","2","2","3","3","3","4","4","4","4"),h=c("1","1","2","1","2","3","1","2","3","4"))
b <- data.frame(g=c("1","2","3","3","3","4","4","4","4","4"),i=c("1","2","3","2","1","2","3","4","5","6"))
Run Code Online (Sandbox Code Playgroud)
g
代表分组变量,并h
和i
列我想合并/加入
> a
g h
1 1 1
2 2 1
3 2 2
4 3 1
5 3 2
6 3 3
7 4 1
8 4 2
9 4 3
10 4 4
> b
g i
1 1 1
2 2 2
3 3 3
4 3 2
5 3 1
6 4 2
7 4 3
8 4 4
9 4 5
10 4 6
Run Code Online (Sandbox Code Playgroud)
a
并且b
应该在分组变量的级别上合并,g
而相同的值h
和它们i
应该放在一起(与它们出现在h
/ 中的顺序无关i
),并且不应该将相同的值组合一次(不是所有可能的组合).
决赛df
看起来像:
g h i
1 1 1 1
2 2 1 <NA>
3 2 2 2
4 3 1 1
5 3 2 2
6 3 3 3
7 4 1 <NA>
8 4 2 2
9 4 3 3
10 4 4 4
11 4 <NA> 5
12 4 <NA> 6
Run Code Online (Sandbox Code Playgroud)
我需要那个df来执行相关分析.
听起来像一个merge
on h==i
,同时保留i
,所以创建一个新的变量x
来加入,并保持两边的连接结果(all=TRUE
).给@Moody_Mudskipper一个大帽子:
merge(transform(a,x=h), transform(b,x=i), all=TRUE)
# g x h i
#1 1 1 1 1
#2 2 1 1 <NA>
#3 2 2 2 2
#4 3 1 1 1
#5 3 2 2 2
#6 3 3 3 3
#7 4 1 1 <NA>
#8 4 2 2 2
#9 4 3 3 3
#10 4 4 4 4
#11 4 5 <NA> 5
#12 4 6 <NA> 6
Run Code Online (Sandbox Code Playgroud)