Tem*_*Rex 5 parsing r dplyr tidyeval
我想编写一个通用weighted_summarise()函数,它将自动解析和转换用户调用的函数调用的形式:
data %>% weighted_summarise(weights, a = sum(b), c = mean(d))
Run Code Online (Sandbox Code Playgroud)
进入委托给的实际调用dplyr::summarise
data %>% dplyr::summarise(a = sum(weights * b), c = mean(weights * d))
Run Code Online (Sandbox Code Playgroud)
这里,a和c是要在缩减数据中创建的新列, 和b是d中weights的现有列data。
理想情况下,我希望我像调用“native”一样调用我的函数dplyr::summarise,但有一个额外的weights参数散布到每个聚合函数中。
weighted_summarise <- function(data, weights, ...) {
data %>% dplyr::summarise(
# how to manipulate the ... and inject the weights in each name-value pair?
)
}
Run Code Online (Sandbox Code Playgroud)
问题如何操作省略号,以便weights将其注入到每个名称-值对的适当位置?我想以某种方式捕获 AST 并系统地遍历它并操作它。
这是...通过将多个表达式转换为单个字符串并将其解析以求值来将“权重”插入到传入的表达式中的一种选项
weighted_summarise <- function(data, weights, ...) {\n weights <- rlang::as_string(rlang::ensym(weights))\n \n v1 <- purrr::map_chr(rlang::enexprs(...), \n ~ stringr::str_replace(rlang::as_label(.x), "\\\\(",\n function(x) stringr::str_c("(", weights, "*")))\n eval(rlang::parse_expr(stringr::str_c("data %>% \n summarise(", stringr::str_c(names(v1), v1, sep = "=", \n collapse = ", "), ")")))\n \n }\nRun Code Online (Sandbox Code Playgroud)\n-测试
\n> data %>%\n weighted_summarise(weights, a = sum(b), c = mean(d))\n# A tibble: 1 \xc3\x97 2\n a c\n <dbl> <dbl>\n1 -2.95 1.13\n\n# testing with the original summarise code outside the function\n> data %>% \n dplyr::summarise(a = sum(weights * b), c = mean(weights * d))\n# A tibble: 1 \xc3\x97 2\n a c\n <dbl> <dbl>\n1 -2.95 1.13\nRun Code Online (Sandbox Code Playgroud)\ndata <- structure(list(b = c(-0.545880758366027, 0.536585304107612, 0.419623148618683, \n-0.583627199210279, 0.847460017311944, 0.266021979364892, 0.444585270360416, \n-0.466495123565759, -0.848370043948898, 0.00231194241576697), \n d = c(-1.31690812429962, 0.598269112694685, -0.7622143703459, \n -1.42909030324076, 0.332244449013422, -0.469060687608488, \n -0.334986793584065, 1.53625215550584, 0.609994533253692, \n 0.51633569843567), weights = 1:10), class = c("tbl_df", "tbl", \n"data.frame"), row.names = c(NA, -10L))\nRun Code Online (Sandbox Code Playgroud)\n