dplyr计算group by中的非NA值

Question

dplyr计算group by中的非NA值

这是我的例子

mydf<-data.frame('col_1'=c('A','A','B','B'), 'col_2'=c(100,NA, 90,30))

Run Code Online (Sandbox Code Playgroud)

我想分组col_1 并计算非NA元素col_2

我想这样做dplyr.

以下是我搜索SO后尝试的内容:

mydf %>% group_by(col_1) %>% summarise_each(funs(!is.na(col_2)))
mydf %>% group_by(col_1) %>% mutate(non_na_count = length(col_2, na.rm=TRUE))
mydf %>% group_by(col_1) %>% mutate(non_na_count = count(col_2, na.rm=TRUE))

Run Code Online (Sandbox Code Playgroud)

没有任何效果.有什么建议？

Answer 1

Ric*_*ord 33

你可以用它

mydf %>% group_by(col_1) %>% summarise(non_na_count = sum(!is.na(col_2)))

# A tibble: 2 x 2
   col_1 non_na_count
  <fctr>        <int>
1      A            1
2      B            2

Run Code Online (Sandbox Code Playgroud)

要获取所有列的摘要,请使用`summarise_all(funs(sum(!is.na(.)))) (8认同)

Answer 2

akr*_*run 5

我们可以 filter在'col_2'中设置NA元素，然后执行count'col_1'

mydf %>%
     filter(!is.na(col_2))  %>%
      count(col_1)
# A tibble: 2 x 2
#   col_1     n
#  <fctr> <int>
#1      A     1
#2      B     2

Run Code Online (Sandbox Code Playgroud)

或使用 data.table

library(data.table)
setDT(mydf)[, .(non_na_count = sum(!is.na(col_2))), col_1]

Run Code Online (Sandbox Code Playgroud)

或aggregate从base R

aggregate(cbind(col_2 = !is.na(col_2))~col_1, mydf, sum)
#  col_1 col_2
#1     A     1
#2     B     2

Run Code Online (Sandbox Code Playgroud)

或使用 table

table(mydf$col_1[!is.na(mydf$col_2)])

Run Code Online (Sandbox Code Playgroud)

Answer 3

Any*_*Sti 5

library(knitr)
library(dplyr)

mydf <- data.frame("col_1" = c("A", "A", "B", "B"), 
                   "col_2" = c(100, NA, 90, 30))

mydf %>%
  group_by(col_1) %>%
  select_if(function(x) any(is.na(x))) %>%
  summarise_all(funs(sum(is.na(.)))) -> NA_mydf

kable(NA_mydf)

Run Code Online (Sandbox Code Playgroud)

归档时间：	8 年，8 月前
查看次数：	22829 次
最近记录：	6 年，10 月前