很奇怪:cumsum没有在dplyr上工作

Ser*_*gio 2 r cumsum dplyr tibble

上下文:我想将累积和列添加到名为words_uni的tibble中.我使用了库(dplyr),函数mutate.我使用R版本3.4.1 64位 - Windows 10和RStudio版本1.0.143

> head(words_uni)
# A tibble: 6 x 3
# Groups:   Type [6]
Type   Freq         per
<chr>  <int>       <dbl>
1   the 937839 0.010725848
2     i 918552 0.010505267
3    to 788892 0.009022376
4     a 615082 0.007034551
Run Code Online (Sandbox Code Playgroud)

然后我做了以下事情:

> words_uni1 = words_uni %>%
                      mutate( acum= cumsum(per))
> head(words_uni1)
# A tibble: 6 x 4
# Groups:   Type [6]
Type   Freq         per        acum
<chr>  <int>       <dbl>       <dbl>
1   the 937839 0.010725848 0.010725848
2     i 918552 0.010505267 0.010505267
3    to 788892 0.009022376 0.009022376
4     a 615082 0.007034551 0.007034551
Run Code Online (Sandbox Code Playgroud)

问题:它没有做我期待的事情,我不明白为什么.

我很感激你的意见.提前致谢.

小智 6

您必须先按类型对tibble进行分组.这会导致您的mutate调用按类型计算.

这是一些可重现的代码:

require(readr)
require(dplyr)

x <- read_csv("type, freq, per
the, 937839, 0.010725848
i, 918552, 0.010505267
to, 788892, 0.009022376
a, 615082, 0.007034551")


### ungrouped tibble, desired results
x %>% mutate(acum = cumsum(per))

# A tibble: 4 x 4
type   freq         per       acum
<chr>  <int>       <dbl>      <dbl>
1   the 937839 0.010725848 0.01072585
2     i 918552 0.010505267 0.02123112
3    to 788892 0.009022376 0.03025349
4     a 615082 0.007034551 0.03728804

### grouped tibble
x %>% group_by(type) %>% mutate(acum = cumsum(per))

# A tibble: 4 x 4
# Groups:   type [4]
type   freq         per        acum
<chr>  <int>       <dbl>       <dbl>
1   the 937839 0.010725848 0.010725848
2     i 918552 0.010505267 0.010505267
3    to 788892 0.009022376 0.009022376
4     a 615082 0.007034551 0.007034551
Run Code Online (Sandbox Code Playgroud)

您只需要取消组合数据即可.

word_uni %>% ungroup() %>% mutate(acum = cumsum(per))
Run Code Online (Sandbox Code Playgroud)

应该做的伎俩.

  • 谢谢@Beau,我不知道我必须取消数据分组。它工作完美! (2认同)