dplyr group_by和cummean函数

luc*_*ano 6 r dplyr

我希望下面的代码输出一个包含三行的数据框,每行代表计算每组的平均值后mpg的累积平均值cyl:

library(dplyr)
mtcars %>%
arrange(cyl) %>%
group_by(cyl) %>%
summarise(running.mean.mpg = cummean(mpg))
Run Code Online (Sandbox Code Playgroud)

这就是我期望发生的事情:

mean_cyl_4 <- mtcars %>% 
filter(cyl == 4) %>%
summarise(mean(mpg))

mean_cyl_4_6 <- mtcars %>% 
filter(cyl == 4 | cyl == 6) %>%
summarise(mean(mpg))

mean_cyl_4_6_8 <- mtcars %>% 
filter(cyl == 4 | cyl == 6 | cyl == 8) %>%
summarise(mean(mpg))

data.frame(cyl = c(4,6,8), running.mean.mpg = c(mean_cyl_4[1,1], mean_cyl_4_6[1,1], mean_cyl_4_6_8[1,1]))

  cyl running.mean.mpg
1   4     26.66364
2   6     23.97222
3   8     20.09062
Run Code Online (Sandbox Code Playgroud)

为什么dplyr似乎忽略了group_by(cyl)

mar*_*bel 5

require("dplyr")

mtcars %>%
  arrange(cyl) %>%
  group_by(cyl) %>%
  mutate(running.mean.mpg = cummean(mpg)) %>%
  select(cyl, running.mean.mpg)

# Source: local data frame [32 x 2]
# Groups: cyl
# 
# # cyl running.mean.mpg
# # 1    4         22.80000
# # 2    4         23.60000
# # 3    4         23.33333
# # 4    4         25.60000
# # 5    4         26.56000
# # 6    4         27.78333
# # 7    4         26.88571
# # 8    4         26.93750
Run Code Online (Sandbox Code Playgroud)

为了实验,这也适用data.table.我的意思是,你必须加载dplyr也cummean()可用.

require("data.table")
DT <- as.data.table(mtcars)
DT[,j=list(
  running.mean.mpg = cummean(mpg)
  ), by="cyl"]
Run Code Online (Sandbox Code Playgroud)

  • 嗯,似乎你在混淆`group_by`应该做什么.它基于变量进行分组.您现在要求的是基于不同过滤条件的结果(平均值).这就是为什么`cummean`和`group_by`不起作用,他们是为了别的东西. (2认同)

G. *_*eck 0

使用mutate而不是summarise.