在 data.table 的“by”中使用符号列表

der*_*und 5 evaluation r data.table

我想编写一个函数outer_fun(),它执行一些操作并调用另一个函数inner_fun()。来自 的所有参数outer_fun()都传递给inner_fun()

inner_fun()对 a data.table(这是两个函数的参数)进行一些计算。要通过该函数传递的另一个参数是by

这是我所拥有的草图:

library(data.table)

data("CO2")
setDT(CO2)

outer_fun <- function(DT, by) {
    # some other stuff
    by <- substitute(by)
    inner_fun(DT, by)
}

inner_fun <- function(DT, by) {
    DT[, .(mean = mean(uptake)),
    by = list(Plant, by)]
}

outer_fun(CO2, by = Type)
Run Code Online (Sandbox Code Playgroud)

这会引发错误:

Error in `[.data.table`(DT, , .(mean = mean(uptake)), by = list(Plant,  : 
  column or expression 2 of 'by' or 'keyby' is type language. Do not quote column names. Usage: DT[,sum(colC),by=list(colA,month(colB))] 
Run Code Online (Sandbox Code Playgroud)

据我了解这个问题,我确实必须在by中正确组合两个列表inner_fun()。另一种尝试是这样的:

Error in `[.data.table`(DT, , .(mean = mean(uptake)), by = list(Plant,  : 
  column or expression 2 of 'by' or 'keyby' is type language. Do not quote column names. Usage: DT[,sum(colC),by=list(colA,month(colB))] 
Run Code Online (Sandbox Code Playgroud)

这会引发类似的错误:

 Error in `[.data.table`(DT, , .(mean = mean(uptake)), by = eval(as.expression(list(.by)))) : 
  column or expression 1 of 'by' or 'keyby' is type symbol. Do not quote column names. Usage: DT[,sum(colC),by=list(colA,month(colB))] 
Run Code Online (Sandbox Code Playgroud)

我现在在这里挣扎了两天,我非常感谢您的帮助!

编辑: 看来,我还不清楚所需的解决方案应该具备什么功能。结果应该适用于by中允许的所有类型data.table,例如:

outer_fun <- function(DT, by) {
    # some other stuff
    by <- substitute(by)
    inner_fun(DT, by)
}
inner_fun <- function(DT, by) {
    .by <- append(by, substitute(Plant), after = 0L)
    DT[, .(mean = mean(uptake)),
       by = eval(as.expression(list(.by)))]
}

outer_fun(CO2, by = Type)
Run Code Online (Sandbox Code Playgroud)

mni*_*ist 2

我最终会通过提供字符信息来解决这个问题

library(data.table)

data("CO2")
setDT(CO2)

outer_fun <- function(DT, by) {
  # some other stuff
  by <- substitute(by)
  inner_fun(DT, by)
}

inner_fun <- function(DT, by) {

  # add Plant to the by Information
  byFun <- c("Plant", as.character(by))

  # remove list oder c()- function names
  byFun <- byFun[!byFun %in% c(".", "list", "c")] 

  DT[, .(mean = mean(uptake)),
     by = byFun]
 }

# single column name unquoted
outer_fun(CO2, by = Type)[1:3]
#>    Plant   Type     mean
#> 1:   Qn1 Quebec 33.22857
#> 2:   Qn2 Quebec 35.15714
#> 3:   Qn3 Quebec 37.61429


#list of column names unquoted
outer_fun(CO2, by = .(Type, Treatment))[1:3]
#>    Plant   Type  Treatment     mean
#> 1:   Qn1 Quebec nonchilled 33.22857
#> 2:   Qn2 Quebec nonchilled 35.15714
#> 3:   Qn3 Quebec nonchilled 37.61429

outer_fun(CO2, by = list(Type, Treatment))[1:3]
#>    Plant   Type  Treatment     mean
#> 1:   Qn1 Quebec nonchilled 33.22857
#> 2:   Qn2 Quebec nonchilled 35.15714
#> 3:   Qn3 Quebec nonchilled 37.61429


# single column name as string
outer_fun(CO2, by = "Type")[1:3]
#>    Plant   Type     mean
#> 1:   Qn1 Quebec 33.22857
#> 2:   Qn2 Quebec 35.15714
#> 3:   Qn3 Quebec 37.61429


# multiple column names as string
outer_fun(CO2, by = c("Type", "Treatment"))[1:3]
#>    Plant   Type  Treatment     mean
#> 1:   Qn1 Quebec nonchilled 33.22857
#> 2:   Qn2 Quebec nonchilled 35.15714
#> 3:   Qn3 Quebec nonchilled 37.61429

outer_fun(CO2, by = list("Type", "Treatment"))[1:3]
#>    Plant   Type  Treatment     mean
#> 1:   Qn1 Quebec nonchilled 33.22857
#> 2:   Qn2 Quebec nonchilled 35.15714
#> 3:   Qn3 Quebec nonchilled 37.61429
Run Code Online (Sandbox Code Playgroud)