基于R中条件的窗口(或运行窗口总和)中的累积和

wat*_*wer 9 r dplyr data.table

我试图根据条件计算给定窗口的累积总和.我已经看到解决方案的条件累积和(计算R中的条件运行总和,数据框中的每一行)和滚动总和(R中另一个变量的滚动总和)的线程,但我找不到两者.我还看到R data.table滑动窗口data.table没有滚动窗口功能.所以,这个问题对我来说非常具有挑战性.

此外,Mike Grahan在滚动总和上发布解决方案超出了我的理解范围.我正在寻找data.table主要用于速度的基础方法.但是,如果可以理解的话,我对其他方法持开放态度.

这是我的输入数据:

DFI <- structure(list(FY = c(2011, 2012, 2013, 2015, 2016, 2011, 2011, 
2012, 2013, 2014, 2015, 2010, 2016, 2013, 2014, 2015, 2010), 
    Customer = c(13575, 13575, 13575, 13575, 13575, 13575, 13575, 
    13575, 13575, 13575, 13575, 13578, 13578, 13578, 13578, 13578, 
    13578), Product = c("A", "A", "A", "A", "A", "B", "B", "B", 
    "B", "B", "B", "A", "A", "B", "C", "D", "E"), Rev = c(4, 
    3, 3, 1, 2, 1, 2, 3, 4, 5, 6, 3, 2, 2, 4, 2, 2)), .Names = c("FY", 
"Customer", "Product", "Rev"), row.names = c(NA, 17L), class = "data.frame")
Run Code Online (Sandbox Code Playgroud)

这是我的预期输出:(手动创建;如果出现手动错误,我道歉)

DFO <- structure(list(FY = c(2011, 2012, 2013, 2015, 2016, 2011, 2012, 
2013, 2014, 2015, 2010, 2016, 2013, 2014, 2015, 2010), Customer = c(13575, 
13575, 13575, 13575, 13575, 13575, 13575, 13575, 13575, 13575, 
13578, 13578, 13578, 13578, 13578, 13578), Product = c("A", "A", 
"A", "A", "A", "B", "B", "B", "B", "B", "A", "A", "B", "C", "D", 
"E"), Rev = c(4, 3, 3, 1, 2, 3, 3, 4, 5, 6, 3, 2, 2, 4, 2, 2), 
    cumsum = c(4, 7, 10, 11, 9, 3, 6, 10, 15, 21, 3, 2, 2, 4, 
    2, 2)), .Names = c("FY", "Customer", "Product", "Rev", "cumsum"
), row.names = c(NA, 16L), class = "data.frame")
Run Code Online (Sandbox Code Playgroud)

关于逻辑的一些评论:

1)我想在5年内找到滚动金额.理想情况下,我希望这5年的时间是可变的,即我可以在代码中的其他地方指定的内容.这样,我可以随后改变窗口以进行分析.

2)窗口的结束基于最大年份(即FY上例).在上面的例子中,max FYin DFI2016.因此,窗口的起始年份将是2016 - 5 + 1 = 2012所有条目2016.

3)窗口总和(或运行总和)由Customer特定的和计算Product.

我尝试了什么:

我想在发帖前尝试一些事情.这是我的代码:

  DFI <- data.table::as.data.table(DFI)

  #Sort it first
  DFI<-DFI[order(Customer,FY),]

  #find cumulative sum; remove Rev column; order rows
  DFOTest<-DFI[,cumsum := cumsum(Rev),by=.(Customer,Product)][,.SD[which.max(cumsum)],by=.(FY,Customer,Product)][,("Rev"):=NULL][order(Customer,Product,FY)]
Run Code Online (Sandbox Code Playgroud)

此代码计算累积总和,但我无法定义5年窗口,然后计算运行总和.我有两个问题:

问题1)如何计算5年运行总和?

问题2)有人可以在这个帖子上解释Mike的方法吗?它看起来很快.但是,我不确定那里发生了什么.我确实看到有人要求一些评论,但我不确定它是否是不言自明的.

提前致谢.我已经在这个问题上苦苦挣扎了两天.

G. *_*eck 7

1)rollapply创建一个Sum函数,它采用FYRev作为2列矩阵(或者如果不是一个矩阵矩阵),然后k将去年那些年份的收入相加.然后转换DFI到数据表,具有相同的客户/产品/年总和行和运行rollapplyrSum每个客户/产品组.

library(data.table)
library(zoo)

k <- 5
Sum <- function(x) {
  x <- matrix(x,, 2)
  FY <- x[, 1]
  Rev <- x[, 2]
  ok <- FY >= tail(FY, 1) - k + 1
  sum(Rev[ok])
}
DT <- as.data.table(DFI)
DT <- DT[, list(Rev = sum(Rev)), by = c("Customer", "Product", "FY")]
DT[, cumsum := rollapplyr(.SD, k, Sum, by.column = FALSE, partial = TRUE),
       by = c("Customer", "Product"), .SDcols = c("FY", "Rev")]
Run Code Online (Sandbox Code Playgroud)

赠送:

 > DT
    Customer Product   FY Rev cumsum
 1:    13575       A 2011   4      4
 2:    13575       A 2012   3      7
 3:    13575       A 2013   3     10
 4:    13575       A 2015   1     11
 5:    13575       A 2016   2      9
 6:    13575       B 2011   3      3
 7:    13575       B 2012   3      6
 8:    13575       B 2013   4     10
 9:    13575       B 2014   5     15
10:    13575       B 2015   6     21
11:    13578       A 2010   3      3
12:    13578       A 2016   2      2
13:    13578       B 2013   2      2
14:    13578       C 2014   4      4
15:    13578       D 2015   2      2
16:    13578       E 2010   2      2
Run Code Online (Sandbox Code Playgroud)

2)data.table only

首先对具有相同Customer/Product/FY的行进行排序,然后按Customer/Product对每个FY值进行分组fy,选择RevFY值介于fy-k+1和之间的值fy和sum.

library(data.table)

k <- 5
DT <- as.data.table(DFI)
DT <- DT[, list(Rev = sum(Rev)), by = c("Customer", "Product", "FY")]
DT[, cumsum := sapply(FY, function(fy) sum(Rev[between(FY, fy-k+1, fy)])),
       by = c("Customer", "Product")]
Run Code Online (Sandbox Code Playgroud)

赠送:

> DT
    Customer Product   FY Rev cumsum
 1:    13575       A 2011   4      4
 2:    13575       A 2012   3      7
 3:    13575       A 2013   3     10
 4:    13575       A 2015   1     11
 5:    13575       A 2016   2      9
 6:    13575       B 2011   3      3
 7:    13575       B 2012   3      6
 8:    13575       B 2013   4     10
 9:    13575       B 2014   5     15
10:    13575       B 2015   6     21
11:    13578       A 2010   3      3
12:    13578       A 2016   2      2
13:    13578       B 2013   2      2
14:    13578       C 2014   4      4
15:    13578       D 2015   2      2
16:    13578       E 2010   2      2
Run Code Online (Sandbox Code Playgroud)

  • data.table 解决方案很漂亮! (2认同)