R dplyr 将多个函数总结为选定的变量

Question

R dplyr 将多个函数总结为选定的变量

我有一个数据集，我想对其进行平均值总结，但也计算其中 1 个变量的最大值。

\n\n

让我从一个我想要实现的目标开始：

\n\n

iris %>%\n  group_by(Species) %>%\n  filter(Sepal.Length > 5) %>%\n  summarise_at("Sepal.Length:Petal.Width",funs(mean))\n

Run Code Online (Sandbox Code Playgroud)\n\n

这给了我以下结果

\n\n

# A tibble: 3 \xc3\x97 5\n     Species Sepal.Length Sepal.Width Petal.Length Petal.Width\n      <fctr>        <dbl>       <dbl>        <dbl>       <dbl>\n1     setosa          5.8         4.4          1.9         0.5\n2 versicolor          7.0         3.4          5.1         1.8\n3  virginica          7.9         3.8          6.9         2.5\n

Run Code Online (Sandbox Code Playgroud)\n\n

有没有简单的方法来添加，例如max(Petal.Width)总结？

\n\n

到目前为止，我已经尝试过以下方法：

\n\n

iris %>%\n  group_by(Species) %>%\n  filter(Sepal.Length > 5) %>%\n  summarise_at("Sepal.Length:Petal.Width",funs(mean)) %>%\n  mutate(Max.Petal.Width = max(iris$Petal.Width))\n

Run Code Online (Sandbox Code Playgroud)\n\n

但通过这种方法，我丢失了上面代码中的group_by和，并给出了错误的结果。filter

\n\n

我能够实现的唯一解决方案如下：

\n\n

iris %>%\n  group_by(Species) %>%\n  filter(Sepal.Length > 5) %>%\n  summarise_at("Sepal.Length:Petal.Width",funs(mean,max)) %>%\n  select(Species:Petal.Width_mean,Petal.Width_max) %>% \n  rename(Max.Petal.Width = Petal.Width_max) %>%\n  rename_(.dots = setNames(names(.), gsub("_.*$","",names(.))))\n

Run Code Online (Sandbox Code Playgroud)\n\n

这有点复杂，需要大量输入才能添加具有不同摘要的列。

\n\n

谢谢

\n

Answer 1

tor*_*art 6

尽管这是一个老问题，但它仍然是一个有趣的问题，我有两个解决方案，我相信任何找到此页面的人都应该可以使用它们。

解决方案一

我自己的看法：

mapply(summarise_at, 
       .vars = lst(names(iris)[!names(iris)%in%"Species"], "Petal.Width"), 
       .funs = lst(mean, max), 
       MoreArgs = list(.tbl = iris %>% group_by(Species) %>% filter(Sepal.Length > 5))) 
%>% reduce(merge, by = "Species")

    #         Species Sepal.Length Sepal.Width Petal.Length Petal.Width.x Petal.Width.y
    #    1     setosa        5.314       3.714        1.509        0.2773           0.5
    #    2 versicolor        5.998       2.804        4.317        1.3468           1.8
    #    3  virginica        6.622       2.984        5.573        2.0327           2.5

Run Code Online (Sandbox Code Playgroud)

解决方案二

受此讨论purrr启发，使用 tidyverse 本身的包的优雅解决方案：

list(.vars = lst(names(iris)[!names(iris)%in%"Species"], "Petal.Width"),
     .funs = lst("mean" = mean, "max" = max)) %>% 
      pmap(~ iris %>% group_by(Species) %>% filter(Sepal.Length > 5) %>% summarise_at(.x, .y)) 
      %>% reduce(inner_join, by = "Species")

+ + + # A tibble: 3 x 6
  Species    Sepal.Length Sepal.Width Petal.Length Petal.Width.x Petal.Width.y
  <fct>             <dbl>       <dbl>        <dbl>         <dbl>         <dbl>
1 setosa             5.31        3.71         1.51         0.277           0.5
2 versicolor         6.00        2.80         4.32         1.35            1.8
3 virginica          6.62        2.98         5.57         2.03            2.5

Run Code Online (Sandbox Code Playgroud)

简短讨论

data.frame 和 tibble 是所需的结果，最后一列是 of 的max，petal.width其他列是所有其他列的平均值（按组和过滤器）。

这两种解决方案都取决于三个实现：

summarise_at接受两个列表作为参数，n个变量之一和m个函数之一，并将所有m 个函数应用于所有n 个变量，因此在 tibble 中生成m X n 个向量。因此，解决方案可能意味着强制该函数以某种方式在“对”之间循环，这些“对”由我们希望应用一个特定函数的所有变量和一个函数组成，然后是另一组变量和它们自己的函数，依此类推！
现在，上面的内容在 R 中意味着什么？什么强制对两个列表的相应元素进行操作？诸如mapply或函数族之类的函数map2及其pmap来自dplyrtidyverse 同伴的变体purrr。两者都接受两个包含l 个元素的列表，并对两个列表的相应元素（按位置匹配）执行给定操作。
因为产品不是 tibble 或 data.frame，而是一个列表，所以您只需使用reducewithinner_join或 just merge。

请注意，我获得的方法与OP的方法不同，但它们也是我通过他的可重现示例获得的方法（也许我们有两个不同版本的数据集iris？）。

归档时间：	9 年，2 月前
查看次数：	11281 次
最近记录：	5 年，8 月前