使用dplyr :: filter问题创建R函数

x85*_*s16 6 r filter dplyr tidyeval rlang

我已经查看了其他答案,但找不到下面代码的解决方案.基本上,我正在创建一个函数,inner_join即两个数据框并filter基于函数中输入的列.

问题是filter函数的一部分不起作用.但是,如果我将函数过滤掉并将其追加,它就可以工作mydiff("a") %>% filter(a.x != a.y)

任何建议都有帮助.

请注意,我是引号中的函数输入

library(dplyr)

# fake data
df1<- tibble(id = seq(4,19,2), 
             a = c("a","b","c","d","e","f","g","h"), 
             b = c(rep("foo",3), rep("bar",5)))
df2<- tibble(id = seq(10, 20, 1), 
             a = c("d","a", "e","f","k","m","g","i","h", "a", "b"),
             b = c(rep("bar", 7), rep("foo",4)))

# What I am trying to do
dplyr::inner_join(df1, df2, by = "id") %>% select(id, b.x, b.y) %>% filter(b.x!=b.y)

#> # A tibble: 1 x 3
#>      id b.x   b.y  
#>   <dbl> <chr> <chr>
#> 1    18 bar   foo

# creating a function so that I can filter by difference in column if I have more columns
mydiff <- function(filteron, df_1 = df1, df_2 = df2){
  require(dplyr, warn.conflicts = F)
  col_1 = paste0(quo_name(filteron), "x")
  col_2 = paste0(quo_name(filteron), "y")
  my_df<- inner_join(df_1, df_2, by = "id", suffix = c("x", "y"))
  my_df %>% select(id, col_1, col_2) %>% filter(col_1 != col_2)
}

# the filter part is not working as expected. 
# There is no difference whether i pipe filter or leave it out
mydiff("a")

#> # A tibble: 5 x 3
#>      id ax    ay   
#>   <dbl> <chr> <chr>
#> 1    10 d     d    
#> 2    12 e     e    
#> 3    14 f     k    
#> 4    16 g     g    
#> 5    18 h     h
Run Code Online (Sandbox Code Playgroud)

Tun*_*ung 6

它在您的原始函数中不起作用的原因是,col_1string只是dplyr::filter()LHS 的预期“未加引号”输入变量。因此,您需要先使用(bang bang)将其转换col_1为变量,sym()然后在内部取消引用它。filter!!

rlang具有非常好的功能qq_show来显示引用/取消引用实际发生的情况(请参阅下面的输出)

另见这个类似的问题

library(rlang)
library(dplyr)

# creating a function that can take either string or symbol as input
mydiff <- function(filteron, df_1 = df1, df_2 = df2) {

  col_1 <- paste0(quo_name(enquo(filteron)), "x")
  col_2 <- paste0(quo_name(enquo(filteron)), "y")

  my_df <- inner_join(df_1, df_2, by = "id", suffix = c("x", "y"))

  cat('\nwithout sym and unquote\n')
  qq_show(col_1 != col_2)

  cat('\nwith sym and unquote\n')
  qq_show(!!sym(col_1) != !!sym(col_2))
  cat('\n')

  my_df %>% 
    select(id, col_1, col_2) %>% 
    filter(!!sym(col_1) != !!sym(col_2))
}

### testing: filteron as a string
mydiff("a")
#> 
#> without sym and unquote
#> col_1 != col_2
#> 
#> with sym and unquote
#> ax != ay
#> 
#> # A tibble: 1 x 3
#>      id ax    ay   
#>   <dbl> <chr> <chr>
#> 1    14 f     k

### testing: filteron as a symbol
mydiff(a)
#> 
#> without sym and unquote
#> col_1 != col_2
#> 
#> with sym and unquote
#> ax != ay
#>  
#> # A tibble: 1 x 3
#>      id ax    ay   
#>   <dbl> <chr> <chr>
#> 1    14 f     k
Run Code Online (Sandbox Code Playgroud)

reprex 包(v0.2.1.9000)于 2018 年 9 月 28 日创建


Jus*_*tin 5

来自https://dplyr.tidyverse.org/articles/programming.html

大多数dplyr函数使用非标准评估(NSE).这是一个包罗万象的术语,这意味着他们不遵循通常的R评估规则.

在尝试将它们包装在函数中时,这有时会产生一些问题.这是您创建的函数的基本版本.

mydiff<- function(filteron, df_1=df1, df_2 = df2){

                 col_1 = paste0(filteron,"x")
                 col_2 = paste0(filteron, "y")

                 my_df <- merge(df1, df2, by="id", suffixes = c("x","y"))

                 my_df[my_df[, col_1] != my_df[, col_2], c("id", col_1, col_2)]  
         }

> mydiff("a")
  id ax ay
3 14  f  k
> mydiff("b")
  id  bx  by
5 18 bar foo
Run Code Online (Sandbox Code Playgroud)

这将解决您的问题,并且可能会按照现在和将来的预期工作.随着对外部包的依赖性降低,您可以减少这些问题以及将来随着包作者发展其工作而可能出现的其他问题.

  • 有趣的观点.但也许放弃dplyr扩展了代码的可移植性,因为编写函数而不会变得更简单,更可预测,更一致.由于功能是包和包的构建块仍然是将R代码发送给其他人的黄金标准,因此基本代码比dplyr更便携并且达到更广泛的数据源. (3认同)