与 R 中的日期相关的条件累积和

Mat*_*Noh 2 r conditional-statements cumsum dplyr data.table

我想计算每组5天内的累计总和。

df <- data.frame(
  date = ymd( c( "2022-01-02","2022-01-03","2022-01-05","2022-01-07","2022-01-11","2022-01-14","2022-01-17","2022-01-18","2022-01-24","2022-01-27","2022-01-01","2022-01-04","2022-01-04","2022-01-08","2022-01-12","2022-01-14","2022-01-19","2022-01-24","2022-01-25","2022-01-28")),
  group = c("A","A","A","A","A","A","A","A","A","A","B","B","B","B","B","B","B","B","B","B"),
  number = c(10,30,20,50,30,50,40,50,30,50,55,10,30,20,50,30,40,30,40,30))
Run Code Online (Sandbox Code Playgroud)

下面是我的数据框的一个小样本,包括累积总和列应返回的内容。任何帮助,将不胜感激。谢谢。

date       group number cumsum(s)
2022-01-02 A     10     10
2022-01-03 A     30     40
2022-01-05 A     20     60
2022-01-07 A     50     110
2022-01-11 A     30     80
2022-01-14 A     50     80
2022-01-17 A     40     90
2022-01-18 A     50     140
2022-01-24 A     30     30
2022-01-27 A     50     80
2022-01-01 B     55     55
2022-01-04 B     10     65
2022-01-04 B     30     95
2022-01-08 B     20     60
2022-01-12 B     50     70
2022-01-14 B     30     80
2022-01-19 B     40     70
2022-01-24 B     30     70
2022-01-25 B     40     70
2022-01-28 B     30     100
Run Code Online (Sandbox Code Playgroud)

我尝试使用 map() 和 cumsum() 但失败了。

All*_*ron 6

这给出了预期的输出:

library(tidyverse)

df %>%
  group_by(group) %>%
  mutate(cumsum = sapply(date, \(x) sum(number[date >= (x - 5) & date <= x])))
#> # A tibble: 20 x 4
#> # Groups:   group [2]
#>    date       group number cumsum
#>    <date>     <chr>  <dbl>  <dbl>
#>  1 2022-01-02 A         10     10
#>  2 2022-01-03 A         30     40
#>  3 2022-01-05 A         20     60
#>  4 2022-01-07 A         50    110
#>  5 2022-01-11 A         30     80
#>  6 2022-01-14 A         50     80
#>  7 2022-01-17 A         40     90
#>  8 2022-01-18 A         50    140
#>  9 2022-01-24 A         30     30
#> 10 2022-01-27 A         50     80
#> 11 2022-01-01 B         55     55
#> 12 2022-01-04 B         10     95
#> 13 2022-01-04 B         30     95
#> 14 2022-01-08 B         20     60
#> 15 2022-01-12 B         50     70
#> 16 2022-01-14 B         30     80
#> 17 2022-01-19 B         40     70
#> 18 2022-01-24 B         30     70
#> 19 2022-01-25 B         40     70
#> 20 2022-01-28 B         30    100
Run Code Online (Sandbox Code Playgroud)

创建于 2022 年 10 月 30 日,使用reprex v2.0.2


lan*_*ang 5

您可以使用非等值连接来连接数据本身data.table,并使用.EACHI来估计累积总和

library(data.table)

df = setDT(df)[, d:=date-5][]

cbind(
  df[df,on=.(group,date<=date, date>=d), .(result = sum(number)), .EACHI][, .(group,date,result)],
  df[, .(number)]
)
Run Code Online (Sandbox Code Playgroud)

输出:

    group       date result number
 1:     A 2022-01-02     10     10
 2:     A 2022-01-03     40     30
 3:     A 2022-01-05     60     20
 4:     A 2022-01-07    110     50
 5:     A 2022-01-11     80     30
 6:     A 2022-01-14     80     50
 7:     A 2022-01-17     90     40
 8:     A 2022-01-18    140     50
 9:     A 2022-01-24     30     30
10:     A 2022-01-27     80     50
11:     B 2022-01-01     55     55
12:     B 2022-01-04     95     10
13:     B 2022-01-04     95     30
14:     B 2022-01-08     60     20
15:     B 2022-01-12     70     50
16:     B 2022-01-14     80     30
17:     B 2022-01-19     70     40
18:     B 2022-01-24     70     30
19:     B 2022-01-25     70     40
20:     B 2022-01-28    100     30
Run Code Online (Sandbox Code Playgroud)

更新-替代方法

假设我们想要对多个移位值执行此操作。另一种方法是获取每个组在整个天数(包括未表示的天数)内的总体累积总和,并使用轮班。

  1. 加载 data.table 并使用 `setDT() 设置表
library(data.table)
setDT(df)
Run Code Online (Sandbox Code Playgroud)
  1. 将每个组展开到最小和最大日期,并获取每组的总体累积总和
df = df[df[, data.table(date=seq(min(date), max(date),1)), group], on=.(group,date)][
  ,cs:=cumsum(fifelse(is.na(number),0,number)),group]
Run Code Online (Sandbox Code Playgroud)
  1. 现在,创建一个辅助函数 f,它接受一个移位值(例如 5 或 10)并返回该移位的累积和
f <- function(s,cs) {
  r = cs-shift(cs,s+1)
  r[is.na(r)] <- cs[is.na(r)]
  r
}
Run Code Online (Sandbox Code Playgroud)
  1. 设置所需移位的向量,并将该函数应用于每个移位值
s = c(3,5,7,10)
df[, (paste0("c",s)):=lapply(s, f,cs=cs),group][!is.na(number)]
Run Code Online (Sandbox Code Playgroud)

输出:

          date  group number    cs    c3    c5    c7   c10
        <Date> <char>  <num> <num> <num> <num> <num> <num>
 1: 2022-01-02      A     10    10    10    10    10    10
 2: 2022-01-03      A     30    40    40    40    40    40
 3: 2022-01-05      A     20    60    60    60    60    60
 4: 2022-01-07      A     50   110    70   110   110   110
 5: 2022-01-11      A     30   140    30    80   100   140
 6: 2022-01-14      A     50   190    80    80   130   150
 7: 2022-01-17      A     40   230    90    90   120   170
 8: 2022-01-18      A     50   280    90   140   170   170
 9: 2022-01-24      A     30   310    30    30   120   170
10: 2022-01-27      A     50   360    80    80    80   170
11: 2022-01-01      B     55    55    55    55    55    55
12: 2022-01-04      B     10    65    65    65    65    65
13: 2022-01-04      B     30    95    40    95    95    95
14: 2022-01-08      B     20   115    20    60    60   115
15: 2022-01-12      B     50   165    50    70    70   110
16: 2022-01-14      B     30   195    80    80   100   130
17: 2022-01-19      B     40   235    40    70   120   120
18: 2022-01-24      B     30   265    30    70    70   100
19: 2022-01-25      B     40   305    70    70   110   110
20: 2022-01-28      B     30   335    70   100   100   140
          date  group number    cs    c3    c5    c7   c10
Run Code Online (Sandbox Code Playgroud)