我有两个数据框,大多数值相同
DF1:
v1 v2 v3 v4 v5 v6 v7 ......
500 40 5.2 z1 .....
500 40 7.2 z2 .....
500 40 9.0 z3 .....
500 40 3.5 z4 .....
500 40 4.2 z5 .....
Run Code Online (Sandbox Code Playgroud)
DF2:
v1 v2 v3 v4 v5 v6 v7 .....
500 40 5.1 m1 .....
500 40 7.9 m2 .....
500 20 8.6 m3 .....
500 40 3.7 m4 .....
500 40 4.0 m5 .....
Run Code Online (Sandbox Code Playgroud)
我想合并(或任何类似的函数),以便我的新df1文件具有完全匹配的v1和v2,但v3不需要严格精确.有没有办法可以将v3与+/- 0.2匹配?
我希望最终的df1看起来像:
v1 v2 v3 v4 v5 v6 v7 .....
500 40 5.2 z1 .....
500 40 3.5 z4 .....
500 40 4.2 z5 .....
Run Code Online (Sandbox Code Playgroud)
我到目前为止,但我不知道如何解释第v3列的可变性.
hed <- c("v1", "v2", "v3") #original data didn't have header
df1_final <- merge(df1, df2[hed],by=hed)
Run Code Online (Sandbox Code Playgroud)
如果有更好的语言来处理这个问题我也会接受,但这只是我正在研究的整个R脚本的一部分.
首先使用tidyverse我们join,然后filter使用near(和容差):
library(tidyverse)
df1 <- data_frame(v1 = c(500, 500, 500, 500, 500),
v2 = c(40, 40, 40, 40, 40),
v3 = c(5.2, 7.2, 9.0, 3.5, 4.2),
v4 = c("z1", "z2", "z3", "z4", "z5"))
df2 <- data_frame(v1 = c(500, 500, 500, 500, 500),
v2 = c(40, 40, 20, 40, 40),
v3 = c(5.1, 7.9, 8.6, 3.7, 4.0),
v4 = c("m1", "m2", "m3", "m4", "m5"))
df1 %>%
full_join(df2, by = c("v1", "v2")) %>% # join on v1 and v2
filter(near(v3.x, v3.y, tol = 0.21)) %>% # filter with a tolerance
rename(v3 = v3.x, v4 = v4.x) %>% # rename the columns
select(v1:v4) # select em
Run Code Online (Sandbox Code Playgroud)
这产生了
# A tibble: 3 x 4
v1 v2 v3 v4
<dbl> <dbl> <dbl> <chr>
1 500. 40. 5.20 z1
2 500. 40. 3.50 z4
3 500. 40. 4.20 z5
Run Code Online (Sandbox Code Playgroud)