R:按日期范围划分的dplyr组

wos*_*hui 6 group-by r dplyr

我正在尝试根据"2016-04-10"和"2016-04-24"将数据框分组为3个日期范围.

df <- structure(list(date = structure(c(16803, 16810, 16817, 16824, 
16831, 16838, 16845, 16852, 16859, 16866, 16873, 16880, 16887, 
16894, 16901, 16908, 16915, 16922, 16929, 16936, 16943), class = "Date"), 
    new = c(1507L, 2851L, 3550L, 5329L, 7557L, 5546L, 6264L, 
    7160L, 9468L, 5789L, 5928L, 4642L, 8145L, 4867L, 4846L, 5231L, 
    7137L, 3938L, 3741L, 2937L, 194L), resolved = c(21, 27, 15, 
    16, 56, 2773, 8490, 8748, 9325, 7734, 10264, 6739, 6110, 
    9613, 10314, 10349, 7200, 9637, 10831, 11170, 5666), ost = c(1486, 
    2824, 3535, 5313, 7501, 2773, -2226, -1588, 143, -1945, -4336, 
    -2097, 2035, -4746, -5468, -5118, -63, -5699, -7090, -8233, 
    -5472)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, 
-21L), .Names = c("date", "new", "resolved", "ost"))
Run Code Online (Sandbox Code Playgroud)

尝试以下方法:

df1 <- df %>% group_by(dr=cut(date,breaks=as.Date(c("2016-04-10","2016-04-24")))) %>%
                summarise(ost = sum(ost))
Run Code Online (Sandbox Code Playgroud)

这给出了如下错误的结果.

        dr    ost
2016-04-10 -10586
        NA -17885
Run Code Online (Sandbox Code Playgroud)

帮助赞赏!

Sot*_*tos 6

您可以先创建分组变量,

df %>% 
mutate(group = cumsum(grepl('2016-04-10|2016-04-24', date))) %>%
group_by(group) %>% 
summarise(ost = sum(ost))

#Source: local data frame [3 x 2]

#  group    ost
#  (int)  (dbl)
#1     0   8672
#2     1 -10586
#3     2 -26557
Run Code Online (Sandbox Code Playgroud)


akr*_*run 5

我们创建了一个分组变量 'dr' cut。该breaks提到的是range“日期”,即在中minmax由OP指定的日期沿着“日期”的价值观,它串联(c),使用该选项include.lowest,并获得sum基于此分组变量“OST”的。

df %>%
  group_by(dr = cut(date, breaks = c(range(date), 
            as.Date(c("2016-04-10", "2016-04-24"))), include.lowest=TRUE) ) %>% 
  summarise(ost =sum(ost))
#         dr    ost
#     <fctr>  <dbl>
#1 2016-01-03   8672
#2 2016-04-10 -10586
#3 2016-04-24 -26557
Run Code Online (Sandbox Code Playgroud)

或者另一种选择是findIntervalcut.

df %>%
  group_by(dr = findInterval(date, as.Date(c("2016-04-10", "2016-04-24")))) %>% 
  summarise(ost = sum(ost))
#     dr    ost
#  <int>  <dbl>
#1     0   8672
#2     1 -10586
#3     2 -26557
Run Code Online (Sandbox Code Playgroud)

注意:OP 提出了这个问题cut,这个解决方案给出了这个问题。