dplyr总结:从命名向量创建变量

Her*_*sas 8 r dplyr

这是我的问题:

我正在使用一个返回命名向量的函数.这是一个玩具示例:

toy_fn <- function(x) {
    y <- c(mean(x), sum(x), median(x), sd(x))
    names(y) <- c("Right", "Wrong", "Unanswered", "Invalid")
    y
}
Run Code Online (Sandbox Code Playgroud)

我在dplyr中使用group_by为每个组应用此函数(典型的split-apply-combine).那么,这是我的玩具data.frame:

set.seed(1234567)
toy_df <- data.frame(id = 1:1000, 
                     group = sample(letters, 1000, replace = TRUE), 
                     value = runif(1000))
Run Code Online (Sandbox Code Playgroud)

这是我的目标:

toy_summary <- 
    toy_df %>% 
    group_by(group) %>% 
    summarize(Right = toy_fn(value)["Right"], 
              Wrong = toy_fn(value)["Wrong"], 
              Unanswered = toy_fn(value)["Unanswered"], 
              Invalid = toy_fn(value)["Invalid"])

> toy_summary
Source: local data frame [26 x 5]

   group     Right    Wrong Unanswered   Invalid
1      a 0.5038394 20.15358  0.5905526 0.2846468
2      b 0.5048040 15.64892  0.5163702 0.2994544
3      c 0.5029442 21.62660  0.5072733 0.2465612
4      d 0.5124601 14.86134  0.5382463 0.2681955
5      e 0.4649483 17.66804  0.4426197 0.3075080
6      f 0.5622644 12.36982  0.6330269 0.2850609
7      g 0.4675324 14.96104  0.4692404 0.2746589
Run Code Online (Sandbox Code Playgroud)

有用!但是调用四次相同的功能并不酷.我宁愿喜欢dplyr来获取命名向量并为向量中的每个元素创建一个新变量.像这样的东西:

toy_summary <- 
    toy_df %>% 
    group_by(group) %>% 
    summarize(toy_fn(value))
Run Code Online (Sandbox Code Playgroud)

不幸的是,这不起作用,因为"错误:期望单个值".

我想,好吧,让我们把矢量转换为data.frame使用data.frame(as.list(x)).但这也不起作用.我尝试了很多东西,但是我无法欺骗dplyr认为它实际上接收了4个不同变量的单个值(观察).有没有办法帮助dplyr意识到这一点?

Dav*_*urg 6

一种可能的解决方案是使用dplyr SE功能.例如,设置您的功能如下

dots <- setNames(list(  ~ mean(value),  
                         ~ sum(value),  
                      ~ median(value), 
                         ~ sd(value)),  
                 c("Right", "Wrong", "Unanswered", "Invalid"))
Run Code Online (Sandbox Code Playgroud)

然后,您可以使用summarize_(带有_)如下

toy_df %>% 
  group_by(group) %>% 
  summarize_(.dots = dots)
# Source: local data table [26 x 5]
# 
#    group     Right    Wrong Unanswered   Invalid
# 1      o 0.4490776 17.51403  0.4012057 0.2749956
# 2      s 0.5079569 15.23871  0.4663852 0.2555774
# 3      x 0.4620649 14.78608  0.4475117 0.2894502
# 4      a 0.5038394 20.15358  0.5905526 0.2846468
# 5      t 0.5041168 24.19761  0.5330790 0.3171022
# 6      m 0.4806628 21.14917  0.4805273 0.2825026
# 7      c 0.5029442 21.62660  0.5072733 0.2465612
# 8      w 0.4932484 17.75694  0.4891746 0.3309680
# 9      q 0.5350707 22.47297  0.5608505 0.2749941
# 10     g 0.4675324 14.96104  0.4692404 0.2746589
# ..   ...       ...      ...        ...       ...
Run Code Online (Sandbox Code Playgroud)

虽然它看起来不错,但这里有一个很大的问题.value在设置函数时,您必须知道要在先验()上操作的列,因此如果您没有dots正确设置,它将无法用于其他列名称.


作为奖励,这是data.table使用原始功能的简单解决方案

library(data.table)
setDT(toy_df)[, as.list(toy_fn(value)), by = group]
#     group     Right    Wrong Unanswered   Invalid
#  1:     o 0.4490776 17.51403  0.4012057 0.2749956
#  2:     s 0.5079569 15.23871  0.4663852 0.2555774
#  3:     x 0.4620649 14.78608  0.4475117 0.2894502
#  4:     a 0.5038394 20.15358  0.5905526 0.2846468
#  5:     t 0.5041168 24.19761  0.5330790 0.3171022
#  6:     m 0.4806628 21.14917  0.4805273 0.2825026
#  7:     c 0.5029442 21.62660  0.5072733 0.2465612
#  8:     w 0.4932484 17.75694  0.4891746 0.3309680
#  9:     q 0.5350707 22.47297  0.5608505 0.2749941
# 10:     g 0.4675324 14.96104  0.4692404 0.2746589
#...
Run Code Online (Sandbox Code Playgroud)


Jos*_* W. 3

您还可以尝试以下方法do()

toy_df %>%
  group_by(group) %>%
  do(res = toy_fn(.$value))
Run Code Online (Sandbox Code Playgroud)