如何有效地将多个函数同时应用于同一个数据帧并将结果保存为数据帧列表?

And*_*own 2 r apply purrr

我想同时将多个不同的函数应用于一个数据帧,然后将结果放入数据帧列表中。因此,例如,我可以arrange按一列,然后将输出保存为新的数据框。或者我可以filter一些数据,然后保存为另一个新的数据框(等等)。我觉得一定有一种简单的方法可以用purrror来做到这一点apply,但我不确定如何继续。所以,我想知道是否有一种方法可以给出函数列表,然后返回数据帧列表。以下是我应用的一些示例函数mtcars

\n
library(tidyverse)\n\nfilter_df <- function(x, word) {\n  x %>% \n    tibble::rownames_to_column("ID") %>% \n    filter(str_detect(ID, word))\n}\na <- filter_df(mtcars, "Merc")\n\n\nmean_n_df <- function(x, grp, mean2) {\n  x %>%\n    group_by({{grp}}) %>%\n    summarise(mean = mean({{mean2}}), n = n())\n}\nb <- mean_n_df(mtcars, grp = cyl, mean2 = wt)\n\n\n\nrating <- function(x, a, b, c) {\n  x %>% \n    rowwise %>% \n    mutate(rating = ({{a}}*2) + ({{b}}-5) * abs({{c}} - 30))\n  \n}\nc <- rating(mtcars, a = cyl, b = drat, c = qsec)\n\n\n\npct <- function(data, var, round = 4){\n  var_expr <- rlang::enquo(var)\n  colnm_expr <- paste(rlang::get_expr(var_expr), "pct", sep = "_")\n  \n  data %>%\n    mutate(!! colnm_expr := !!var_expr/sum(!!var_expr) %>%\n             round(round))\n}\nd <- pct(mtcars, mpg)\n
Run Code Online (Sandbox Code Playgroud)\n

我知道我可以运行上面的代码,然后将每个数据帧绑定到一个列表中。

\n
df_list <- list(mtcars, a, b, c, d)\n\nstr(df_list, 1)[[1]]\n\nList of 5\n $ :'data.frame':   32 obs. of  11 variables:\n $ :'data.frame':   7 obs. of  12 variables:\n $ : tibble [3 \xc3\x97 3] (S3: tbl_df/tbl/data.frame)\n $ : rowwise_df [32 \xc3\x97 12] (S3: rowwise_df/tbl_df/tbl/data.frame)\n  ..- attr(*, "groups")= tibble [32 \xc3\x97 1] (S3: tbl_df/tbl/data.frame)\n $ :'data.frame':   32 obs. of  12 variables:\n
Run Code Online (Sandbox Code Playgroud)\n

r2e*_*ans 6

这似乎有点定制(因为每个函数需要不同的参数),但我会使用Map(orpurrr::map2purrr::pmap),传递一个函数和它的参数:

filter_df <- function(x, word) {
  x %>% 
    tibble::rownames_to_column("ID") %>% 
    filter(str_detect(ID, word))
}
mean_n_df <- function(x, grp, mean2) {
  x %>%
    group_by({{grp}}) %>%
    summarise(mean = mean({{mean2}}), n = n())
}
rating <- function(x, a, b, c) {
  x %>% 
    rowwise %>% 
    mutate(rating = ({{a}}*2) + ({{b}}-5) * abs({{c}} - 30))
}
pct <- function(data, var, round = 4){
  var_expr <- rlang::enquo(var)
  colnm_expr <- paste(rlang::get_expr(var_expr), "pct", sep = "_")
  data %>%
    mutate(!! colnm_expr := !!var_expr/sum(!!var_expr) %>%
             round(round))
}
Run Code Online (Sandbox Code Playgroud)

通话:

out <- Map(
  function(fun, args) do.call(fun, c(list(mtcars), args)),
  list(filter_df, mean_n_df, rating, pct),
  list(list("Merc"), list(grp = quo(cyl), mean2 = quo(wt)),
       list(a = quo(cyl), b = quo(drat), c = quo(qsec)),
       list(quo(mpg)))
)

lapply(out, head, 3)
# [[1]]
#          ID  mpg cyl  disp  hp drat   wt qsec vs am gear carb
# 1 Merc 240D 24.4   4 146.7  62 3.69 3.19 20.0  1  0    4    2
# 2  Merc 230 22.8   4 140.8  95 3.92 3.15 22.9  1  0    4    2
# 3  Merc 280 19.2   6 167.6 123 3.92 3.44 18.3  1  0    4    4
# [[2]]
# # A tibble: 3 x 3
#     cyl  mean     n
#   <dbl> <dbl> <int>
# 1     4  2.29    11
# 2     6  3.12     7
# 3     8  4.00    14
# [[3]]
# # A tibble: 3 x 12
# # Rowwise: 
#     mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb rating
#   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>  <dbl>
# 1  21       6   160   110  3.9   2.62  16.5     0     1     4     4  -2.89
# 2  21       6   160   110  3.9   2.88  17.0     0     1     4     4  -2.28
# 3  22.8     4   108    93  3.85  2.32  18.6     1     1     4     1  -5.10
# [[4]]
#                mpg cyl disp  hp drat    wt  qsec vs am gear carb    mpg_pct
# Mazda RX4     21.0   6  160 110 3.90 2.620 16.46  0  1    4    4 0.03266449
# Mazda RX4 Wag 21.0   6  160 110 3.90 2.875 17.02  0  1    4    4 0.03266449
# Datsun 710    22.8   4  108  93 3.85 2.320 18.61  1  1    4    1 0.03546430
Run Code Online (Sandbox Code Playgroud)

一些东西:

  • 因为您演示了使用未评估的符号 ( grp=cyl),所以我们必须quo首先对它们进行测试,否则它们将在到达函数之前进行评估。

  • 您可以通过不在 anon-func 中硬编码它来将其推广到任意数据Map,方法是:

    out <- Map(
      function(x, fun, args) do.call(fun, c(list(x), args)),
      list(mtcars),
      list(filter_df, mean_n_df, rating, pct),
      list(list("Merc"), list(grp = quo(cyl), mean2 = quo(wt)),
           list(a = quo(cyl), b = quo(drat), c = quo(qsec)),
           list(quo(mpg)))
    )
    
    Run Code Online (Sandbox Code Playgroud)

    其中list(.)aroundmtcars是故意的:它显示为 length-1 到Map,因此它被回收用于其他参数(每个长度为 4)。如果没有list, Map 将会失败,因为第一个函数会看到第一列(作为向量),第二个函数会看到第二列(和/或警告longer argument not a multiple of length of shorter......我真的希望 R 中未对齐的回收会比这更失败)。

    这种泛化允许将这一系列函数分别应用于多个数据集:

    out2 <- lapply(list(mtcars[1:10,], mtcars[11:32,]), function(XYZ) {
      Map(
        function(x, fun, args) do.call(fun, c(list(x), args)),
        list(XYZ),
        list(filter_df, mean_n_df, rating, pct),
        list(list("Merc"), list(grp = quo(cyl), mean2 = quo(wt)),
             list(a = quo(cyl), b = quo(drat), c = quo(qsec)),
             list(quo(mpg)))
      )
    })
    
    Run Code Online (Sandbox Code Playgroud)

    不确定您是否打算开始将函数列表应用于数据集列表......