eli*_*ing 7 r lazy-evaluation ggplot2
我主要ggplot2用于可视化.通常,我以交互方式设计绘图(即ggplot2使用NSE的原始代码),但最后,我经常最终将该代码包装到接收要绘制的数据和变量的函数中.这总是有点噩梦.
所以,典型情况看起来像这样.我有一些数据,我为它创建了一个图(在这种情况下,一个非常非常简单的例子,使用随附的mpg数据集ggplot2).
library(ggplot2)
data(mpg)
ggplot(data = mpg,
mapping = aes(x = class, y = hwy)) +
geom_boxplot() +
geom_jitter(alpha = 0.1, color = "blue")
Run Code Online (Sandbox Code Playgroud)

当我完成绘图设计时,我通常希望将它用于不同的变量或数据等.所以我创建了一个函数,接收绘图的数据和变量作为参数.但是由于NSE,它并不像编写函数头那样容易,然后复制/粘贴和替换函数参数的变量.这不起作用,如下所示.
mpg <- mpg
plotfn <- function(data, xvar, yvar){
ggplot(data = data,
mapping = aes(x = xvar, y = yvar)) +
geom_boxplot() +
geom_jitter(alpha = 0.1, color = "blue")
}
plotfn(mpg, class, hwy) # Can't find object
## Don't know how to automatically pick scale for object of type function. Defaulting to continuous.
## Warning: restarting interrupted promise evaluation
## Error in eval(expr, envir, enclos): object 'hwy' not found
plotfn(mpg, "class", "hwy") #
Run Code Online (Sandbox Code Playgroud)

所以,我要回去和修复代码,例如,使用aes_string
的这一翻译aes使用NSE(在本例中是相当容易的,但对于更复杂的情节,有很多转换和层,这变成了一场噩梦).
plotfn <- function(data, xvar, yvar){
ggplot(data = data,
mapping = aes_string(x = xvar, y = yvar)) +
geom_boxplot() +
geom_jitter(alpha = 0.1, color = "blue")
}
plotfn(mpg, "class", "hwy") # Now this works
Run Code Online (Sandbox Code Playgroud)

而事实是,我发现非常方便的NSE lazyeval.所以我喜欢这样做.
mpg <- mpg
plotfn <- function(data, xvar, yvar){
data_gd <- data.frame(
xvar = lazyeval::lazy_eval(substitute(xvar), data = data),
yvar = lazyeval::lazy_eval(substitute(yvar), data = data))
ggplot(data = data_gd,
mapping = aes(x = xvar, y = yvar)) +
geom_boxplot() +
geom_jitter(alpha = 0.1, color = "blue")
}
plotfn(mpg, class, hwy) # Now this works
Run Code Online (Sandbox Code Playgroud)
plotfn(mpg, "class", "hwy") # This still works
Run Code Online (Sandbox Code Playgroud)
plotfn(NULL, rep(letters[1:4], 250), 1:100) # And even this crazyness works
Run Code Online (Sandbox Code Playgroud)

这给我的情节功能带来了很大的灵活性.例如,您可以直接传递引用或不带引号的变量名称甚至数据而不是变量名称(滥用惰性求值的类型).
但这有一个很大的问题.该功能无法以编程方式使用.
dynamically_changing_xvar <- "class"
plotfn(mpg, dynamically_changing_xvar, hwy)
## Error in eval(expr, envir, enclos): object 'dynamically_changing_xvar' not found
# This does not work, because it never finds the object
# dynamically_changing_xvar in the data, and it does not get evaluated to
# obtain the variable name (class)
Run Code Online (Sandbox Code Playgroud)
所以我不能使用循环(例如lapply)为变量或数据的不同组合生成相同的图.
所以我想滥用更多的懒惰,标准和非标准评估,并尝试将它们全部组合起来,以便我同时具备上述灵活性和以编程方式使用该功能的能力.基本上,我要做的就是用tryCatch先lazy_eval为每个变量表达式,如果失败,评估解析表达式.
plotfn <- function(data, xvar, yvar){
data_gd <- NULL
data_gd$xvar <- tryCatch(
expr = lazyeval::lazy_eval(substitute(xvar), data = data),
error = function(e) eval(envir = data, expr = parse(text=xvar))
)
data_gd$yvar <- tryCatch(
expr = lazyeval::lazy_eval(substitute(yvar), data = data),
error = function(e) eval(envir = data, expr = parse(text=yvar))
)
ggplot(data = as.data.frame(data_gd),
mapping = aes(x = xvar, y = yvar)) +
geom_boxplot() +
geom_jitter(alpha = 0.1, color = "blue")
}
plotfn(mpg, class, hwy) # Now this works, again
Run Code Online (Sandbox Code Playgroud)
plotfn(mpg, "class", "hwy") # This still works, again
Run Code Online (Sandbox Code Playgroud)
plotfn(NULL, rep(letters[1:4], 250), 1:100) # And this crazyness still works
Run Code Online (Sandbox Code Playgroud)
# And now, I can also pass a local variable to the function, that contains
# the name of the variable that I want to plot
dynamically_changing_xvar <- "class"
plotfn(mpg, dynamically_changing_xvar, hwy)
Run Code Online (Sandbox Code Playgroud)

因此,除了前面提到的灵活性之外,现在我可以使用单行左右来生成许多相同的图,具有不同的变量(或数据).
lapply(c("class", "fl", "drv"), FUN = plotfn, yvar = hwy, data = mpg)
## [[1]]
Run Code Online (Sandbox Code Playgroud)
##
## [[2]]
Run Code Online (Sandbox Code Playgroud)
##
## [[3]]
Run Code Online (Sandbox Code Playgroud)

即使它非常实用,我怀疑这不是好习惯.但它的实践有多糟糕?这是我的关键问题.我还可以使用哪些其他替代方案来实现两全其美?
当然,我可以看到这种模式可能会产生问题.例如.
# If I have a variable in the global environment that contains the variable
# I want to plot, but whose name is in the data passed to the function,
# then it will use the name of the variable and not its content
drv <- "class"
plotfn(mpg, drv, hwy) # Here xvar on the plot is drv and not class
Run Code Online (Sandbox Code Playgroud)

还有一些(很多?)其他问题.但在我看来,语法灵活性方面的好处超过了其他问题.有什么想法吗?
为了清楚起见,提取您建议的功能:
library(ggplot2)
data(mpg)
plotfn <- function(data, xvar, yvar){
data_gd <- NULL
data_gd$xvar <- tryCatch(
expr = lazyeval::lazy_eval(substitute(xvar), data = data),
error = function(e) eval(envir = data, expr = parse(text=xvar))
)
data_gd$yvar <- tryCatch(
expr = lazyeval::lazy_eval(substitute(yvar), data = data),
error = function(e) eval(envir = data, expr = parse(text=yvar))
)
ggplot(data = as.data.frame(data_gd),
mapping = aes(x = xvar, y = yvar)) +
geom_boxplot() +
geom_jitter(alpha = 0.1, color = "blue")
}
Run Code Online (Sandbox Code Playgroud)
这样的函数通常非常有用,因为您可以自由地混合字符串和裸变量名称。但正如你所说,它可能并不总是安全的。考虑以下人为的示例:
class <- "drv"
Class <- "drv"
plotfn(mpg, class, hwy)
plotfn(mpg, Class, hwy)
Run Code Online (Sandbox Code Playgroud)
你的函数会生成什么?这些会相同吗(它们不是)?我不太清楚结果会是什么。使用这样的函数进行编程可能会产生意想不到的结果,具体取决于哪些变量存在data于环境中。由于很多人使用诸如x,xvar或 之类的变量名称count(即使他们可能不应该这样做),事情可能会变得混乱。
另外,如果我想强制对 的一种或另一种解释class,我不能。
我想说这有点类似于使用attach: 方便,但在某些时候它可能会咬你的屁股。
因此,我会使用 NSE 和 SE 对:
plotfn <- function(data, xvar, yvar) {
plotfn_(data,
lazyeval::lazy_eval(xvar, data = data),
lazyeval::lazy_eval(yvar, data = data))
)
}
plotfn_ <- function(data, xvar, yvar){
ggplot(data = data,
mapping = aes_(x = xvar, y = yvar)) +
geom_boxplot() +
geom_jitter(alpha = 0.1, color = "blue")
}
Run Code Online (Sandbox Code Playgroud)
我认为创建这些实际上比你的函数更容易。您也可以选择惰性地捕获所有参数lazy_dots。
现在,使用安全 SE 版本时,我们可以更轻松地预测结果:
class <- "drv"
Class <- "drv"
plotfn_(mpg, class, 'hwy')
plotfn_(mpg, Class, 'hwy')
Run Code Online (Sandbox Code Playgroud)
NSE 版本仍然受到影响:
plotfn(mpg, class, hwy)
plotfn(mpg, Class, hwy)
Run Code Online (Sandbox Code Playgroud)
(我觉得ggplot2::aes_不带字符串有点烦人。)