如何使用 dplyr 合并汇总频率表

sca*_*der 4 r dplyr tidyverse

我有以下数据框:

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
df <- nycflights13::flights %>% 
  select(distance) %>% 
  group_by(distance) %>% 
  summarise(n = n()) %>% 
  arrange(distance) %>% ungroup() 

df
#> # A tibble: 214 x 2
#>    distance     n
#>       <dbl> <int>
#>  1       17     1
#>  2       80    49
#>  3       94   976
#>  4       96   607
#>  5      116   443
#>  6      143   439
#>  7      160   376
#>  8      169   545
#>  9      173   221
#> 10      184  5504
#> # … with 204 more rows
Run Code Online (Sandbox Code Playgroud)

我想要做的是distance按大小为 100 的bin 对列进行 bin,并n相应地对列求和。怎么能这样?

所以你会得到类似的东西:

bin_distance sum_n
1-100       1633  #(1 + 49 + 976 + 607)
101-200     21344 # (443 + ... + 5327)
#etc
Run Code Online (Sandbox Code Playgroud)

Ron*_*hah 6

最简单的方法是通过为每 100 个值和每个组的值cut创建groupsusing来使用。seqsum

library(dplyr)

df %>%
  group_by(group = cut(distance, breaks = seq(0, max(distance), 100))) %>%
  summarise(n = sum(n))


#   group         n
#   <fct>       <int>
# 1 (0,100]      1633
# 2 (100,200]   21344
# 3 (200,300]   28310
# 4 (300,400]    7748
# 5 (400,500]   21292
# 6 (500,600]   26815
# 7 (600,700]    7846
# 8 (700,800]   48904
# 9 (800,900]    7574
#10 (900,1e+03] 18205
# ... with 17 more rows
Run Code Online (Sandbox Code Playgroud)

可以使用aggregatelike将其转换为基数 R

aggregate(n ~ distance, 
 transform(df, distance = cut(distance, breaks = seq(0, max(distance), 100))), sum)
Run Code Online (Sandbox Code Playgroud)


tmf*_*mnk 6

不同的tidyverse解决方案。它紧密遵循 @Ronak Shah 代码的逻辑,但不是cut()使用cut_width()from ggplot2

nycflights13::flights %>%
 select(distance) %>%
 group_by(ints = cut_width(distance, width = 100, boundary = 0)) %>%
 summarise(n = n())

   ints            n
   <fct>       <int>
 1 [0,100]      1633
 2 (100,200]   21344
 3 (200,300]   28310
 4 (300,400]    7748
 5 (400,500]   21292
 6 (500,600]   26815
 7 (600,700]    7846
 8 (700,800]   48904
 9 (800,900]    7574
10 (900,1e+03] 18205
Run Code Online (Sandbox Code Playgroud)