我还在为DataCamp for R上课,所以如果这个问题看起来很天真,请原谅我.
考虑以下(非常人为的)样本:
library(dplyr)
library(tibble)
type <- c("Dog", "Cat", "Cat", "Cat")
name <- c("Ella", "Arrow", "Gabby", "Eddie")
pets = tibble(name, type)
name <- c("Ella", "Arrow", "Dog")
type <- c("Dog", "Cat", "Calvin")
favorites = tibble(name, type)
anti_join(favorites, pets, by = "name")
setdiff(favorites, pets, by = "name")
Run Code Online (Sandbox Code Playgroud)
这两个都返回完全相同的数据:
> anti_join(favorites, pets, by = "name")
# A tibble: 1 × 2
name type
<chr> <chr>
1 Dog Calvin
> setdiff(favorites, pets, by = "name")
# A tibble: 1 × 2
name type
<chr> <chr>
1 Dog Calvin
Run Code Online (Sandbox Code Playgroud)
每个文档的文档似乎只表示一个微妙的区别:setdiff返回行,但anti_join没有.从我的测试来看,情况似乎并非如此.
有人可以向我解释这两者之间的真正差异,也许可以提供一个更清楚地说明差异的更好的例子吗?(这是DataCamp没有特别帮助的领域.)
两者都是第一个参数的子集,但setdiff要求列相同:
library(dplyr)
setdiff(mtcars, mtcars[1:30, ])
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> 1 15.0 8 301 335 3.54 3.57 14.6 0 1 5 8
#> 2 21.4 4 121 109 4.11 2.78 18.6 1 1 4 2
setdiff(mtcars, mtcars[1:30, 1:6])
#> Error in setdiff_data_frame(x, y): not compatible: Cols in x but not y: `carb`, `gear`, `am`, `vs`, `qsec`.
Run Code Online (Sandbox Code Playgroud)
而是anti_join一个连接,所以不是:
anti_join(mtcars, mtcars[1:30, 1:3])
#> Joining, by = c("mpg", "cyl", "disp")
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> 1 15.0 8 301 335 3.54 3.57 14.6 0 1 5 8
#> 2 21.4 4 121 109 4.11 2.78 18.6 1 1 4 2
Run Code Online (Sandbox Code Playgroud)