当我在`dplyr`之后加载`plyr`时,为什么汇总或变异不能用于group_by?

Ign*_*cio 17 r plyr r-faq dplyr

注意:此问题的标题已经过编辑,使其成为plyr功能掩盖其dplyr对应项时的问题的规范问题.问题的其余部分保持不变.


假设我有以下数据:

dfx <- data.frame(
  group = c(rep('A', 8), rep('B', 15), rep('C', 6)),
  sex = sample(c("M", "F"), size = 29, replace = TRUE),
  age = runif(n = 29, min = 18, max = 54)
)
Run Code Online (Sandbox Code Playgroud)

有了旧的,plyr我可以使用以下代码创建一个总结我的数据的小表:

require(plyr)
ddply(dfx, .(group, sex), summarize,
      mean = round(mean(age), 2),
      sd = round(sd(age), 2))
Run Code Online (Sandbox Code Playgroud)

输出看起来像这样:

  group sex  mean    sd
1     A   F 49.68  5.68
2     A   M 32.21  6.27
3     B   F 31.87  9.80
4     B   M 37.54  9.73
5     C   F 40.61 15.21
6     C   M 36.33 11.33
Run Code Online (Sandbox Code Playgroud)

我想将我的代码dplyr%>%运营商.我的代码采用DF然后按组和性别对其进行分组,然后对其进行汇总.那是:

dfx %>% group_by(group, sex) %>% 
  summarise(mean = round(mean(age), 2), sd = round(sd(age), 2))
Run Code Online (Sandbox Code Playgroud)

但我的输出是:

  mean   sd
1 35.56 9.92
Run Code Online (Sandbox Code Playgroud)

我究竟做错了什么?

Car*_*lli 23

这里的问题是你首先加载dplyr然后plyr,所以plyr的函数summarise掩盖了dplyr的函数summarise.当发生这种情况时,您会收到此警告

require(plyr)
    Loading required package: plyr
------------------------------------------------------------------------------------------
You have loaded plyr after dplyr - this is likely to cause problems.
If you need functions from both plyr and dplyr, please load plyr first, then dplyr:
library(plyr); library(dplyr)
------------------------------------------------------------------------------------------

Attaching package: ‘plyr’

The following objects are masked from ‘package:dplyr’:

    arrange, desc, failwith, id, mutate, summarise, summarize
Run Code Online (Sandbox Code Playgroud)

因此,为了使您的代码正常工作,请先拆分plyr detach(package:plyr)或重新启动R并先加载plyr然后再加载dplyr(或仅加载dplyr):

library(dplyr)
dfx %>% group_by(group, sex) %>% 
  summarise(mean = round(mean(age), 2), sd = round(sd(age), 2))
Source: local data frame [6 x 4]
Groups: group

  group sex  mean    sd
1     A   F 41.51  8.24
2     A   M 32.23 11.85
3     B   F 38.79 11.93
4     B   M 31.00  7.92
5     C   F 24.97  7.46
6     C   M 36.17  9.11
Run Code Online (Sandbox Code Playgroud)

或者您可以在代码中显式调用dplyr的汇总,因此无论您如何加载包,都将调用正确的函数:

dfx %>% group_by(group, sex) %>% 
  dplyr::summarise(mean = round(mean(age), 2), sd = round(sd(age), 2))
Run Code Online (Sandbox Code Playgroud)

  • 我不明白为什么这么少人注意到这个警告:/ (10认同)
  • @hadley`fortunes :: fortune(9)` (2认同)