dplyr:带有引用变量名称的mutate的标准评估

tch*_*rty 6 r dplyr

我将如何使用mutate(我的假设是我在我的情况下寻找标准评估,因此mutate_,但我对这一点并不完全有信心)当使用接受变量名称列表的函数时,例如:

createSum = function(data, variableNames) {
  data %>% 
    mutate_(sumvar = interp(~ sum(var, na.rm = TRUE), 
                            var = as.name(paste(as.character(variableNames), collapse =","))))

}
Run Code Online (Sandbox Code Playgroud)

这是一个MWE,它将函数剥离到其核心逻辑并演示我想要实现的目标:

library(dplyr)
library(lazyeval)

# function to make random table with given column names
makeTable = function(colNames, sampleSize) {
  liSample = lapply(colNames, function(week) {
    sample = rnorm(sampleSize)
  })
  names(liSample) = as.character(colNames)
  return(tbl_df(data.frame(liSample, check.names = FALSE)))
}

# create some sample data with the column name patterns required
weekDates = seq.Date(from = as.Date("2014-01-01"),
                     to = as.Date("2014-08-01"), by = "week")
dfTest = makeTable(weekDates, 10)

# test mutate on this table
dfTest %>% 
  mutate_(sumvar = interp(~ sum(var, na.rm = TRUE), 
                          var = as.name(paste(as.character(weekDates), collapse =","))))
Run Code Online (Sandbox Code Playgroud)

这里的预期输出是将返回的:

rowSums(dfTest[, as.character(weekDates)])
Run Code Online (Sandbox Code Playgroud)

MrF*_*ick 5

我想这就是你所追求的

createSum = function(data, variableNames) {
    data %>% 
        mutate_(sumvar = paste(as.character(variableNames), collapse ="+"))
}
createSum(dfTest, weekDates)
Run Code Online (Sandbox Code Playgroud)

我们只提供一个字符值,而不是interp因为你不能将一个名字列表作为一个参数传递给一个函数.另外,sum()会进行一些不希望的折叠,因为操作不是按行进行的,它们一次是在向量列中传递的.

此示例的另一个问题是您check.names=FALSE在data.frame中设置,这意味着您创建的列名不能是有效符号.如果您愿意,可以在反向标记中明确包装变量名称

createSum(dfTest , paste0("`", weekDates,"`"))
Run Code Online (Sandbox Code Playgroud)

但一般来说,最好不要使用无效名称.