我想同时将多个不同的函数应用于一个数据帧,然后将结果放入数据帧列表中。因此,例如,我可以arrange按一列,然后将输出保存为新的数据框。或者我可以filter一些数据,然后保存为另一个新的数据框(等等)。我觉得一定有一种简单的方法可以用purrror来做到这一点apply,但我不确定如何继续。所以,我想知道是否有一种方法可以给出函数列表,然后返回数据帧列表。以下是我应用的一些示例函数mtcars:
library(tidyverse)\n\nfilter_df <- function(x, word) {\n x %>% \n tibble::rownames_to_column("ID") %>% \n filter(str_detect(ID, word))\n}\na <- filter_df(mtcars, "Merc")\n\n\nmean_n_df <- function(x, grp, mean2) {\n x %>%\n group_by({{grp}}) %>%\n summarise(mean = mean({{mean2}}), n = n())\n}\nb <- mean_n_df(mtcars, grp = cyl, mean2 = wt)\n\n\n\nrating <- function(x, a, b, c) {\n x %>% \n rowwise %>% \n mutate(rating = ({{a}}*2) + ({{b}}-5) * abs({{c}} - 30))\n \n}\nc <- rating(mtcars, a = cyl, b = drat, c = qsec)\n\n\n\npct <- function(data, var, round = 4){\n var_expr <- rlang::enquo(var)\n colnm_expr <- paste(rlang::get_expr(var_expr), "pct", sep = "_")\n \n data %>%\n mutate(!! colnm_expr := !!var_expr/sum(!!var_expr) %>%\n round(round))\n}\nd <- pct(mtcars, mpg)\nRun Code Online (Sandbox Code Playgroud)\n我知道我可以运行上面的代码,然后将每个数据帧绑定到一个列表中。
\ndf_list <- list(mtcars, a, b, c, d)\n\nstr(df_list, 1)[[1]]\n\nList of 5\n $ :'data.frame': 32 obs. of 11 variables:\n $ :'data.frame': 7 obs. of 12 variables:\n $ : tibble [3 \xc3\x97 3] (S3: tbl_df/tbl/data.frame)\n $ : rowwise_df [32 \xc3\x97 12] (S3: rowwise_df/tbl_df/tbl/data.frame)\n ..- attr(*, "groups")= tibble [32 \xc3\x97 1] (S3: tbl_df/tbl/data.frame)\n $ :'data.frame': 32 obs. of 12 variables:\nRun Code Online (Sandbox Code Playgroud)\n
这似乎有点定制(因为每个函数需要不同的参数),但我会使用Map(orpurrr::map2或purrr::pmap),传递一个函数和它的参数:
filter_df <- function(x, word) {
x %>%
tibble::rownames_to_column("ID") %>%
filter(str_detect(ID, word))
}
mean_n_df <- function(x, grp, mean2) {
x %>%
group_by({{grp}}) %>%
summarise(mean = mean({{mean2}}), n = n())
}
rating <- function(x, a, b, c) {
x %>%
rowwise %>%
mutate(rating = ({{a}}*2) + ({{b}}-5) * abs({{c}} - 30))
}
pct <- function(data, var, round = 4){
var_expr <- rlang::enquo(var)
colnm_expr <- paste(rlang::get_expr(var_expr), "pct", sep = "_")
data %>%
mutate(!! colnm_expr := !!var_expr/sum(!!var_expr) %>%
round(round))
}
Run Code Online (Sandbox Code Playgroud)
通话:
out <- Map(
function(fun, args) do.call(fun, c(list(mtcars), args)),
list(filter_df, mean_n_df, rating, pct),
list(list("Merc"), list(grp = quo(cyl), mean2 = quo(wt)),
list(a = quo(cyl), b = quo(drat), c = quo(qsec)),
list(quo(mpg)))
)
lapply(out, head, 3)
# [[1]]
# ID mpg cyl disp hp drat wt qsec vs am gear carb
# 1 Merc 240D 24.4 4 146.7 62 3.69 3.19 20.0 1 0 4 2
# 2 Merc 230 22.8 4 140.8 95 3.92 3.15 22.9 1 0 4 2
# 3 Merc 280 19.2 6 167.6 123 3.92 3.44 18.3 1 0 4 4
# [[2]]
# # A tibble: 3 x 3
# cyl mean n
# <dbl> <dbl> <int>
# 1 4 2.29 11
# 2 6 3.12 7
# 3 8 4.00 14
# [[3]]
# # A tibble: 3 x 12
# # Rowwise:
# mpg cyl disp hp drat wt qsec vs am gear carb rating
# <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 21 6 160 110 3.9 2.62 16.5 0 1 4 4 -2.89
# 2 21 6 160 110 3.9 2.88 17.0 0 1 4 4 -2.28
# 3 22.8 4 108 93 3.85 2.32 18.6 1 1 4 1 -5.10
# [[4]]
# mpg cyl disp hp drat wt qsec vs am gear carb mpg_pct
# Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 0.03266449
# Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 0.03266449
# Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 0.03546430
Run Code Online (Sandbox Code Playgroud)
一些东西:
因为您演示了使用未评估的符号 ( grp=cyl),所以我们必须quo首先对它们进行测试,否则它们将在到达函数之前进行评估。
您可以通过不在 anon-func 中硬编码它来将其推广到任意数据Map,方法是:
out <- Map(
function(x, fun, args) do.call(fun, c(list(x), args)),
list(mtcars),
list(filter_df, mean_n_df, rating, pct),
list(list("Merc"), list(grp = quo(cyl), mean2 = quo(wt)),
list(a = quo(cyl), b = quo(drat), c = quo(qsec)),
list(quo(mpg)))
)
Run Code Online (Sandbox Code Playgroud)
其中list(.)aroundmtcars是故意的:它显示为 length-1 到Map,因此它被回收用于其他参数(每个长度为 4)。如果没有list, Map 将会失败,因为第一个函数会看到第一列(作为向量),第二个函数会看到第二列(和/或警告longer argument not a multiple of length of shorter......我真的希望 R 中未对齐的回收会比这更失败)。
这种泛化允许将这一系列函数分别应用于多个数据集:
out2 <- lapply(list(mtcars[1:10,], mtcars[11:32,]), function(XYZ) {
Map(
function(x, fun, args) do.call(fun, c(list(x), args)),
list(XYZ),
list(filter_df, mean_n_df, rating, pct),
list(list("Merc"), list(grp = quo(cyl), mean2 = quo(wt)),
list(a = quo(cyl), b = quo(drat), c = quo(qsec)),
list(quo(mpg)))
)
})
Run Code Online (Sandbox Code Playgroud)
不确定您是否打算开始将函数列表应用于数据集列表......