我想通过 合并两个数据框id,但它们都有 2 个相同的列;因此,当我合并时,我会得到新的.x和.y列。如何将这两个数据框与left_join()当前代码中相同的额外列(`element.x、day.x、element.y 和 day.y)合并并删除,并保留一列。
代码:
# Sample data
df1 <- data.frame(id = seq(1,5), value1 = rnorm(5), element = "TEST1", day = 15)
df2 <- data.frame(id = seq(1,5), value2 = rnorm(5), element = "TEST1", day = 15)
# Merge
df <- left_join(df1, df2, by = "id")
# Output
id value1 element.x day.x value2 element.y day.y
1 1 -0.69700149 TEST1 15 1.4324220 TEST1 15
2 2 -0.25514949 TEST1 15 0.7281354 TEST1 15
3 3 0.09206902 TEST1 15 0.8148839 TEST1 15
4 4 2.51799237 TEST1 15 1.3919671 TEST1 15
5 5 -0.77049050 TEST1 15 -0.2707201 TEST1 15
Run Code Online (Sandbox Code Playgroud)
只需删除您不想要的所有内容df2- 在本例中为id和value2列:
left_join(df1, select(df2, c(id,value2)), by = "id")
# id value1 element day value2
#1 1 1.2276303 TEST1 15 -0.1389861
#2 2 -0.8017795 TEST1 15 -0.5973131
#3 3 -1.0803926 TEST1 15 -2.1839668
#4 4 -0.1575344 TEST1 15 0.2408173
#5 5 -1.0717600 TEST1 15 -0.2593554
Run Code Online (Sandbox Code Playgroud)
请注意,并非所有这些答案都是等效的,因此请询问您需要什么。例如:
df1 <- data.frame(id=1:3,day=2:4,element=3:5,value1=100:102)
df2 <- data.frame(id=1:3,day=3:5,element=4:6,value2=200:202)
df1
# id day element value1
#1 1 2 3 100
#2 2 3 4 101
#3 3 4 5 102
df2
# id day element value2
#1 1 3 4 200
#2 2 4 5 201
#3 3 5 6 202
left_join(df1, df2)
#Joining by: c("id", "day", "element")
# id day element value1 value2
#1 1 2 3 100 NA
#2 2 3 4 101 NA
#3 3 4 5 102 NA
left_join(df1, select(df2, c(id,value2)), by = "id")
# id day element value1 value2
#1 1 2 3 100 200
#2 2 3 4 101 201
#3 3 4 5 102 202
Run Code Online (Sandbox Code Playgroud)
你只需要:
df <- left_join(df1, df2)
Run Code Online (Sandbox Code Playgroud)
by = NULL,默认值,
join将进行自然连接,使用两个表中具有通用名称的所有变量。一条消息列出了变量,以便您可以检查它们是否正确
输出:
Joining by: c("id", "element", "day")
id value1 element day value2
1 1 -0.6264538 TEST1 15 -0.8204684
2 2 0.1836433 TEST1 15 0.4874291
3 3 -0.8356286 TEST1 15 0.7383247
4 4 1.5952808 TEST1 15 0.5757814
5 5 0.3295078 TEST1 15 -0.3053884
Run Code Online (Sandbox Code Playgroud)
值得指出的是 thelatemail 的评论:“加入id与加入不一样id/element/day”。但是,在这个特定示例中,因为element和day对于两个表中的所有记录都相同,所以我们得到了相同的结果。
原始结果
数据
set.seed(1)
df1 <- data.frame(id = seq(1,5), value1 = rnorm(5), element = "TEST1", day = 15)
df2 <- data.frame(id = seq(1,5), value2 = rnorm(5), element = "TEST1", day = 15)
df <- left_join(df1, df2, by = "id")
Run Code Online (Sandbox Code Playgroud)
输出:
id value1 element.x day.x value2 element.y day.y
1 1 -0.6264538 TEST1 15 -0.8204684 TEST1 15
2 2 0.1836433 TEST1 15 0.4874291 TEST1 15
3 3 -0.8356286 TEST1 15 0.7383247 TEST1 15
4 4 1.5952808 TEST1 15 0.5757814 TEST1 15
5 5 0.3295078 TEST1 15 -0.3053884 TEST1 15
Run Code Online (Sandbox Code Playgroud)