Ada*_*m_G 2 r dataframe dplyr summarize
我有一个如下所示的数据框:
library(tidyverse)
x <- tibble(
batch = rep(c(1,2), each=10),
exp_id = c(rep('a',3),rep('b',2),rep('c',5),rep('d',6),rep('e',4))
)
Run Code Online (Sandbox Code Playgroud)
我可以运行下面的代码来获取每个的计数exp_id:
x %>% group_by(batch,exp_id) %>%
summarise(count=n())
Run Code Online (Sandbox Code Playgroud)
生成:
batch exp_id count
<dbl> <chr> <dbl>
1 1 a 3
2 1 b 2
3 1 c 5
4 2 d 6
5 2 e 4
Run Code Online (Sandbox Code Playgroud)
生成这些计数平均值的一种非常丑陋的方法是:
x %>% group_by(batch,exp_id) %>%
summarise(count=n()) %>%
ungroup() %>%
group_by(batch) %>%
summarise(avg_exp = mean(count))
Run Code Online (Sandbox Code Playgroud)
生成:
batch avg_exp
<dbl> <dbl>
1 1 3.33
2 2 5
Run Code Online (Sandbox Code Playgroud)
有没有更简洁和“整洁”的方式来生成这个?
library(dplyr)
group_by(x, batch) %>%
summarize(avg_exp = mean(table(exp_id)))
# # A tibble: 2 x 2
# batch avg_exp
# <dbl> <dbl>
# 1 1 3.33
# 2 2 5
Run Code Online (Sandbox Code Playgroud)