用dplyr和lazyeval编程

Mir*_*lin 9 r lazy-evaluation dplyr

我有问题以保留非标准评估的方式重构dplyr.让我们说我想创建一个总是选择和重命名的函数.

library(lazyeval)
library(dplyr)

df <- data.frame(a = c(1,2,3), f = c(4,5,6), lm = c(7, 8 , 9))

select_happy<- function(df, col){
    col <- lazy(col)
    fo <- interp(~x, x=col)
    select_(df, happy=fo)
}

f <- function(){
    print('foo')
}
Run Code Online (Sandbox Code Playgroud)

select_happy()当库函数使用非标准求值时,根据此帖子Refactor R代码的答案编写.select_happy()适用于未定义或在全局环境中定义的列名.但是,当列名也是另一个名称空间中的函数名称时,它会遇到问题.

select_happy(df, a)
#   happy
# 1     1
# 2     2
# 3     3

select_happy(df, f)
#   happy
# 1     4
# 2     5
# 3     6

select_happy(df, lm)
# Error in eval(expr, envir, enclos) (from #4) : object 'datafile' not found

environment(f)
# <environment: R_GlobalEnv>

environment(lm)
# <environment: namespace:stats>
Run Code Online (Sandbox Code Playgroud)

调用lazy()f和lm显示了惰性对象的差异,其中lm的函数定义出现在惰性对象中,而对于f,它只是函数的名称.

lazy(f)
# <lazy>
#   expr: f
#   env:  <environment: R_GlobalEnv>

lazy(lm)
# <lazy>
#   expr: function (formula, data, subset, weights, na.action, method = "qr",  ...
#   env:  <environment: R_GlobalEnv>
Run Code Online (Sandbox Code Playgroud)

substitute 似乎与lm合作.

 select_happy<- function(df, col){
     col <- substitute(col) # <- substitute() instead of lazy()
     fo <- interp(~x, x=col)
     select_(df, happy=fo)
}

select_happy(df, lm)
#   happy
# 1     7 
# 2     8
# 3     9
Run Code Online (Sandbox Code Playgroud)

然而,看完它后面的小插图lazyeval似乎lazy应该成为一个优越的替代品substitute.此外,常规select功能工作正常.

select(df, happy=lm)
#   happy
# 1     7
# 2     8
# 3     9
Run Code Online (Sandbox Code Playgroud)

我的问题是我怎么写,select_happy()以便它能以所有方式select()工作?我很难围绕范围和非标准评估.更一般地说,使用dplyr编程可以避免这些和其他问题的可靠策略是什么?

编辑

我测试了docendo discimus的解决方案并且效果很好,但我想知道是否有一种方法可以使用参数而不是点来实现该功能.我认为能够使用也很重要,interp()因为您可能希望将输入提供给更复杂的公式,就像我之前链接的帖子一样.我认为这个问题的核心归结为这样一个事实,lazy_dots()即表达方式不同于lazy().我想了解它们为什么表现不同,以及如何使用它们lazy()来获得相同的功能lazy_dots().

g <- function(...){
    lazy_dots(...)
}

h <-  function(x){
    lazy(x)
}

g(lm)[[1]]
# <lazy>
#   expr: lm
#   env:  <environment: R_GlobalEnv>
h(lm)
# <lazy>
#   expr: function (formula, data, subset, weights, na.action, method = "qr",  ...
#   env:  <environment: R_GlobalEnv> 
Run Code Online (Sandbox Code Playgroud)

即使在更改.follow__symbolsFALSElazy(),以便它是一样lazy_dots()不起作用.

lazy
# function (expr, env = parent.frame(), .follow_symbols = TRUE) 
# {
#     .Call(make_lazy, quote(expr), environment(), .follow_symbols)
# }
# <environment: namespace:lazyeval>

lazy_dots
# function (..., .follow_symbols = FALSE) 
# {
#     if (nargs() == 0) 
#         return(structure(list(), class = "lazy_dots"))
#     .Call(make_lazy_dots, environment(), .follow_symbols)
# }
# <environment: namespace:lazyeval>


h2 <-  function(x){
    lazy(x, .follow_symbols=FALSE)
}

h2(lm)
# <lazy>
#  expr: x
#  env:  <environment: 0xe4a42a8>
Run Code Online (Sandbox Code Playgroud)

我觉得自己真的很难做什么.

tal*_*lat 2

select_happy一种选择可能是以与标准函数几乎相同的方式进行写入select

select_happy<- function(df, ...){
  select_(df, .dots = setNames(lazy_dots(...), "happy"))
}

f <- function(){
  print('foo')
}

> select_happy(df, a)
  happy
1     1
2     2
3     3
> 
> select_happy(df, f)
  happy
1     4
2     5
3     6
> 
> select_happy(df, lm)
  happy
1     7
2     8
3     9
Run Code Online (Sandbox Code Playgroud)

注意标准函数的函数定义select为:

> select
function (.data, ...) 
{
    select_(.data, .dots = lazyeval::lazy_dots(...))
}
<environment: namespace:dplyr>
Run Code Online (Sandbox Code Playgroud)

另请注意,根据此定义,select_happy接受选择多个列,但会将任何其他列命名为“NA”:

> select_happy(df, lm, a)
  happy NA
1     7  1
2     8  2
3     9  3
Run Code Online (Sandbox Code Playgroud)

当然,您可以针对这种情况进行一些修改,例如:

select_happy<- function(df, ...){
  dots <- lazy_dots(...)
  n <- length(dots)
  if(n == 1) newnames <- "happy" else newnames <- paste0("happy", seq_len(n))
  select_(df, .dots = setNames(dots, newnames))
}

> select_happy(df, f)
  happy
1     4
2     5
3     6

> select_happy(df, lm, a)
  happy1 happy2
1      7      1
2      8      2
3      9      3
Run Code Online (Sandbox Code Playgroud)