我有几个实验的数据框.我希望计算每次连续实验后获得的唯一值的累积数量.
例如,考虑:
test <- data.frame(exp = c( rep("exp1" , 4) , rep("exp2" , 4), rep("exp3" , 4) , rep("exp4" , 5) ) ,
entries = c("abcd","efgh","ijkl","mnop", "qrst" , "uvwx" , "abcd","efgh","ijkl" , "qrst" , "uvwx",
"yzab" , "yzab" , "cdef" , "mnop" , "uvwx" , "ghij"))
> test
exp entries
1 exp1 abcd
2 exp1 efgh
3 exp1 ijkl
4 exp1 mnop
5 exp2 qrst
6 exp2 uvwx
7 exp2 abcd
8 exp2 efgh
9 exp3 ijkl
10 exp3 qrst
11 exp3 uvwx
12 exp3 yzab
13 exp4 yzab
14 exp4 cdef
15 exp4 mnop
16 exp4 uvwx
17 exp4 ghij
Run Code Online (Sandbox Code Playgroud)
唯一条目的总数是九.现在我希望结果如下:
exp cum_unique_entries
1 exp1 4
2 exp2 6
3 exp3 7
4 exp4 9
Run Code Online (Sandbox Code Playgroud)
最后,我想以条形图的形式绘制这个.我可以用for循环方法做到这一点,但感觉必须有更优雅的方式.
这是另一个解决方案dplyr:
library(dplyr)
test %>%
mutate(cum_unique_entries = cumsum(!duplicated(entries))) %>%
group_by(exp) %>%
slice(n()) %>%
select(-entries)
Run Code Online (Sandbox Code Playgroud)
要么
test %>%
mutate(cum_unique_entries = cumsum(!duplicated(entries))) %>%
group_by(exp) %>%
summarise(cum_unique_entries = last(cum_unique_entries))
Run Code Online (Sandbox Code Playgroud)
结果:
# A tibble: 4 x 2
exp cum_unique_entries
<fctr> <int>
1 exp1 4
2 exp2 6
3 exp3 7
4 exp4 9
Run Code Online (Sandbox Code Playgroud)
注意:
首先找到所有非重复项(cumsum(!duplicated(entries)))的累积总和group_by exp,并取cumsum每组的最后一项,这个数字将是每个组的累积唯一条目.