R中的非交叉值

var*_*dha 5 intersection r

我有两个数据集,每个数据集至少有420,500个观测值,例如

dataset1 <- data.frame(col1=c("microsoft","apple","vmware","delta","microsoft"),
                     col2=paste0(c("a","b","c",4,"asd"),".exe"),
                     col3=rnorm(5))

dataset2 <- data.frame(col1=c("apple","cisco","proactive","dtex","microsoft"),
                     col2=paste0(c("a","b","c",4,"asd"),".exe"),
                     col3=rnorm(5))
> dataset1
       col1    col2 col3
1 microsoft   a.exe    2
2     apple   b.exe    1
3    vmware   c.exe    3
4     delta   4.exe    4
5 microsoft asd.exe    5
> dataset2
       col1    col2 col3
1     apple   a.exe    3
2     cisco   b.exe    4
3    vmware   d.exe    1
4     delta   5.exe    5
5 microsoft asd.exe    2
Run Code Online (Sandbox Code Playgroud)

我想打印在所有的意见dataset1相交的dataset2(比较两个col1col2每个),在这种情况下将打印一切,除了最后一个观察-观察1&2的比赛上col2,但不col1与观察3&4的比赛上col1但不是col2,即:

        col1  col2 col3 
1:     apple b.exe    1 
2:     delta 4.exe    4 
3: microsoft a.exe    2 
4:    vmware c.exe    3 
Run Code Online (Sandbox Code Playgroud)

akr*_*run 5

你可以使用anti_joindplyr

 library(dplyr)
 anti_join(df1, df2, by = c('col1', 'col2'))
 #      col1  col2       col3
 #1     delta 4.exe -0.5836272
 #2    vmware c.exe  0.4196231
 #3     apple b.exe  0.5365853
 #4 microsoft a.exe -0.5458808
Run Code Online (Sandbox Code Playgroud)

数据

 set.seed(24)
 df1 <- data.frame(col1 = c('microsoft', 'apple', 'vmware', 'delta', 
 'microsoft'), col2= c('a.exe', 'b.exe', 'c.exe', '4.exe', 'asd.exe'), 
    col3=rnorm(5), stringsAsFactors=FALSE)
 set.seed(22)
 df2 <- data.frame(col1 = c( 'apple', 'cisco', 'proactive', 'dtex', 
 'microsoft'), col2= c('a.exe', 'b.exe', 'c.exe', '4.exe', 'asd.exe'), 
  col3=rnorm(5), stringsAsFactors=FALSE)
Run Code Online (Sandbox Code Playgroud)