wat*_*wer 9 r dplyr data.table
我试图根据条件计算给定窗口的累积总和.我已经看到解决方案的条件累积和(计算R中的条件运行总和,数据框中的每一行)和滚动总和(R中另一个变量的滚动总和)的线程,但我找不到两者.我还看到R data.table滑动窗口data.table没有滚动窗口功能.所以,这个问题对我来说非常具有挑战性.
此外,Mike Grahan在滚动总和上发布的解决方案超出了我的理解范围.我正在寻找data.table主要用于速度的基础方法.但是,如果可以理解的话,我对其他方法持开放态度.
这是我的输入数据:
DFI <- structure(list(FY = c(2011, 2012, 2013, 2015, 2016, 2011, 2011,
2012, 2013, 2014, 2015, 2010, 2016, 2013, 2014, 2015, 2010),
Customer = c(13575, 13575, 13575, 13575, 13575, 13575, 13575,
13575, 13575, 13575, 13575, 13578, 13578, 13578, 13578, 13578,
13578), Product = c("A", "A", "A", "A", "A", "B", "B", "B",
"B", "B", "B", "A", "A", "B", "C", "D", "E"), Rev = c(4,
3, 3, 1, 2, 1, 2, 3, 4, 5, 6, 3, 2, 2, 4, 2, 2)), .Names = c("FY",
"Customer", "Product", "Rev"), row.names = c(NA, 17L), class = "data.frame")
Run Code Online (Sandbox Code Playgroud)
这是我的预期输出:(手动创建;如果出现手动错误,我道歉)
DFO <- structure(list(FY = c(2011, 2012, 2013, 2015, 2016, 2011, 2012,
2013, 2014, 2015, 2010, 2016, 2013, 2014, 2015, 2010), Customer = c(13575,
13575, 13575, 13575, 13575, 13575, 13575, 13575, 13575, 13575,
13578, 13578, 13578, 13578, 13578, 13578), Product = c("A", "A",
"A", "A", "A", "B", "B", "B", "B", "B", "A", "A", "B", "C", "D",
"E"), Rev = c(4, 3, 3, 1, 2, 3, 3, 4, 5, 6, 3, 2, 2, 4, 2, 2),
cumsum = c(4, 7, 10, 11, 9, 3, 6, 10, 15, 21, 3, 2, 2, 4,
2, 2)), .Names = c("FY", "Customer", "Product", "Rev", "cumsum"
), row.names = c(NA, 16L), class = "data.frame")
Run Code Online (Sandbox Code Playgroud)
关于逻辑的一些评论:
1)我想在5年内找到滚动金额.理想情况下,我希望这5年的时间是可变的,即我可以在代码中的其他地方指定的内容.这样,我可以随后改变窗口以进行分析.
2)窗口的结束基于最大年份(即FY上例).在上面的例子中,max FYin DFI是2016.因此,窗口的起始年份将是2016 - 5 + 1 = 2012所有条目2016.
3)窗口总和(或运行总和)由Customer特定的和计算Product.
我尝试了什么:
我想在发帖前尝试一些事情.这是我的代码:
DFI <- data.table::as.data.table(DFI)
#Sort it first
DFI<-DFI[order(Customer,FY),]
#find cumulative sum; remove Rev column; order rows
DFOTest<-DFI[,cumsum := cumsum(Rev),by=.(Customer,Product)][,.SD[which.max(cumsum)],by=.(FY,Customer,Product)][,("Rev"):=NULL][order(Customer,Product,FY)]
Run Code Online (Sandbox Code Playgroud)
此代码计算累积总和,但我无法定义5年窗口,然后计算运行总和.我有两个问题:
问题1)如何计算5年运行总和?
问题2)有人可以在这个帖子上解释Mike的方法吗?它看起来很快.但是,我不确定那里发生了什么.我确实看到有人要求一些评论,但我不确定它是否是不言自明的.
提前致谢.我已经在这个问题上苦苦挣扎了两天.
1)rollapply创建一个Sum函数,它采用FY和Rev作为2列矩阵(或者如果不是一个矩阵矩阵),然后k将去年那些年份的收入相加.然后转换DFI到数据表,具有相同的客户/产品/年总和行和运行rollapplyr与Sum每个客户/产品组.
library(data.table)
library(zoo)
k <- 5
Sum <- function(x) {
x <- matrix(x,, 2)
FY <- x[, 1]
Rev <- x[, 2]
ok <- FY >= tail(FY, 1) - k + 1
sum(Rev[ok])
}
DT <- as.data.table(DFI)
DT <- DT[, list(Rev = sum(Rev)), by = c("Customer", "Product", "FY")]
DT[, cumsum := rollapplyr(.SD, k, Sum, by.column = FALSE, partial = TRUE),
by = c("Customer", "Product"), .SDcols = c("FY", "Rev")]
Run Code Online (Sandbox Code Playgroud)
赠送:
> DT
Customer Product FY Rev cumsum
1: 13575 A 2011 4 4
2: 13575 A 2012 3 7
3: 13575 A 2013 3 10
4: 13575 A 2015 1 11
5: 13575 A 2016 2 9
6: 13575 B 2011 3 3
7: 13575 B 2012 3 6
8: 13575 B 2013 4 10
9: 13575 B 2014 5 15
10: 13575 B 2015 6 21
11: 13578 A 2010 3 3
12: 13578 A 2016 2 2
13: 13578 B 2013 2 2
14: 13578 C 2014 4 4
15: 13578 D 2015 2 2
16: 13578 E 2010 2 2
Run Code Online (Sandbox Code Playgroud)
2)data.table only
首先对具有相同Customer/Product/FY的行进行排序,然后按Customer/Product对每个FY值进行分组fy,选择RevFY值介于fy-k+1和之间的值fy和sum.
library(data.table)
k <- 5
DT <- as.data.table(DFI)
DT <- DT[, list(Rev = sum(Rev)), by = c("Customer", "Product", "FY")]
DT[, cumsum := sapply(FY, function(fy) sum(Rev[between(FY, fy-k+1, fy)])),
by = c("Customer", "Product")]
Run Code Online (Sandbox Code Playgroud)
赠送:
> DT
Customer Product FY Rev cumsum
1: 13575 A 2011 4 4
2: 13575 A 2012 3 7
3: 13575 A 2013 3 10
4: 13575 A 2015 1 11
5: 13575 A 2016 2 9
6: 13575 B 2011 3 3
7: 13575 B 2012 3 6
8: 13575 B 2013 4 10
9: 13575 B 2014 5 15
10: 13575 B 2015 6 21
11: 13578 A 2010 3 3
12: 13578 A 2016 2 2
13: 13578 B 2013 2 2
14: 13578 C 2014 4 4
15: 13578 D 2015 2 2
16: 13578 E 2010 2 2
Run Code Online (Sandbox Code Playgroud)