dplyr:在group_by对象上计算mean(summarize_each)时处理NAs

use*_*534 9 r mean na dplyr

我有一个数据框md:

md <- data.frame(x = c(3,5,4,5,3,5), y = c(5,5,5,4,4,1), z = c(1,3,4,3,5,5),
      device1 = c("c","a","a","b","c","c"), device2 = c("B","A","A","A","B","B"))
md[2,3] <- NA
md[4,1] <- NA
md
Run Code Online (Sandbox Code Playgroud)

我想使用dplyr计算device1/device2组合的均值:

library(dplyr)
md %>% group_by(device1, device2) %>% summarise_each(funs(mean))
Run Code Online (Sandbox Code Playgroud)

但是,我得到了一些NAs.我希望忽略NA(na.rm = TRUE) - 我试过,但函数不想接受这个参数.这两行都会导致错误:

md %>% group_by(device1, device2) %>% summarise_each(funs(mean), na.rm = TRUE)
md %>% group_by(device1, device2) %>% summarise_each(funs(mean, na.rm = TRUE))
Run Code Online (Sandbox Code Playgroud)

smc*_*mci 11

其他答案表明你的语法传递mean(., na.rm = TRUE)summarize/_each.

就我个人而言,我经常处理这个问题,而且我只是定义了以下一组NA感知基本函数(例如在我的.Rprofile中),这样你就可以用dplyr来应用它们summarize(mean_)而不用任何麻烦的arg传递; 同时保持源代码更清晰,更易读,这是另一个强大的优势:

mean_   <- function(...) mean(..., na.rm=T)
median_ <- function(...) median(..., na.rm=T)
sum_    <- function(...) sum(..., na.rm=T)
sd_     <- function(v)   sqrt(sum((v-mean(v))^2) / length(v))
cor_    <- function(...) cor(..., use='pairwise.complete.obs')
table_  <- function(...) table(..., useNA='ifany')
mode_   <- function(...) {
  tab <- table(...)
  names(tab[tab==max(tab)]) # the '==' implicitly excludes NA values
}
clamp_  <- function(..., minval=0, maxval=70) pmax(minval, pmin(maxval,...))
Run Code Online (Sandbox Code Playgroud)

真的,你希望能够一劳永逸地轻弹一个全局开关,比如na.action/na.pass/na.omit/na.fail告诉函数默认行为该做什么,而不是像现在这样在不同的包中抛出错误或不一致.

曾经有一个CRAN软件包被调用Defaults来设置每个函数的默认值,但它自2014年以来没有维护,在3.x之前.有关它的更多信息,请在项目特定的基础上设置功能默认值R.


jer*_*ycg 10

尝试:

 library(dplyr)
 md %>% group_by(device1, device2) %>%
        summarise_each(funs(mean(., na.rm = TRUE)))
Run Code Online (Sandbox Code Playgroud)


zer*_*323 7

就那么简单:

funs(mean(., na.rm = TRUE))
Run Code Online (Sandbox Code Playgroud)