请帮忙计算Moving/Rolling back Weekly Sum of Amount($4) 一下Distributor wise ($2) and Rolling Date wise.
想要设置像vaiable一样
RollingStartDate ==01/05/2015 and RollingInterval==7 and RollingEndDate ==08/05/2015
Run Code Online (Sandbox Code Playgroud)
例如 :
1st May 2015 Rolling 7 Days data set would be from 01/05/2015 to 25/04/2015
2nd May 2015 Rolling 7 Days data set would be from 02/05/2015 to 26/04/2015
....................................................................
7th May 2015 Rolling 7 Days data set would be from 07/05/2015 to 01/05/2015
8th May 2015 Rolling 7 Days data set would be from 08/05/2015 to 02/05/2015
Run Code Online (Sandbox Code Playgroud)
Input.csv
Des,Date,Distributor,Amount,Loc
aaa,25/04/2015,abc123,25,bbb
aaa,25/04/2015,xyz456,75,bbb
aaa,26/04/2015,xyz456,50,bbb
aaa,27/04/2015,abc123,250,bbb
aaa,27/04/2015,abc123,100,bbb
aaa,29/04/2015,xyz456,50,bbb
aaa,30/04/2015,abc123,25,bbb
aaa,01/05/2015,xyz456,75,bbb
aaa,01/05/2015,abc123,50,bbb
aaa,02/05/2015,abc123,25,bbb
aaa,02/05/2015,xyz456,75,bbb
aaa,04/05/2015,abc123,30,bbb
aaa,04/05/2015,xyz456,35,bbb
aaa,05/05/2015,xyz456,12,bbb
aaa,06/05/2015,abc123,32,bbb
aaa,06/05/2015,xyz456,43,bbb
aaa,07/05/2015,xyz456,87,bbb
aaa,08/05/2015,abc123,58,bbb
aaa,08/05/2015,xyz456,98,bbb
Run Code Online (Sandbox Code Playgroud)
示例:2015年5月8日滚动7天数据集将从2015年5月8日到2015年5月2日
aaa,02/05/2015,abc123,25,bbb
aaa,02/05/2015,xyz456,75,bbb
aaa,04/05/2015,abc123,30,bbb
aaa,04/05/2015,xyz456,35,bbb
aaa,05/05/2015,xyz456,12,bbb
aaa,06/05/2015,abc123,32,bbb
aaa,06/05/2015,xyz456,43,bbb
aaa,07/05/2015,xyz456,87,bbb
aaa,08/05/2015,abc123,58,bbb
aaa,08/05/2015,xyz456,98,bbb
Run Code Online (Sandbox Code Playgroud)
输出2015年5月8日滚动7天数据集
RollingDate,Distributor,Amount
08/05/2015,abc123,145
08/05/2015,xyz456,350
Run Code Online (Sandbox Code Playgroud)
我能够从这个命令获得上面的输出:
awk -F, '{key=$3;b[key]=b[key]+$4} END {for(i in a) print i","b[i]}'
Run Code Online (Sandbox Code Playgroud)
请建议如何推导每周分割数据集然后总和.
期望的输出:
RollingDate,Distributor,Amount
01/05/2015,abc123,450
01/05/2015,xyz456,250
02/05/2015,abc123,450
02/05/2015,xyz456,250
03/05/2015,abc123,450
03/05/2015,xyz456,200
04/05/2015,abc123,130
04/05/2015,xyz456,235
05/05/2015,abc123,130
05/05/2015,xyz456,247
06/05/2015,abc123,162
06/05/2015,xyz456,240
07/05/2015,abc123,137
07/05/2015,xyz456,327
08/05/2015,abc123,145
08/05/2015,xyz456,350
Run Code Online (Sandbox Code Playgroud)
编辑#1
1.
逻辑是找到金额的总和是在7天的范围内向经销商收费,即如果我需要计算5月1日的金额,那么我需要考虑5月1日,4月30日,4月29日,28日的订单项4月,4月27日,4月26日和4月25日,相当于1st May (-) minus 6 days back......明智的5月2日滚动日期等于5月2日至5月26日(2nd May minus 6 days back..)
2.
日期格式为DD/MM/YYYY- 02/05/2015,即5月2日
"RollingStartDate"将有助于从哪个日期需要考虑数据,"RollingInterval"将有助于做"7天"后退或"14天"后退或"每月30天"回溯分析的分析.
"RollingEndDate"将有助于避免如果实际文件包含任何可用的未来日期数据,在这种情况下,如果第9或第15日可能需要排除日期行项目...这是一个解决方案,它只排除了在它们之前7天没有的日期,而不需要特定的开始/停止范围:
$ cat tst.awk
BEGIN { FS=OFS=","; window=(window?window:7); secsPerDay=24*60*60 }
NR==1 { print "RollingDate", $3, $4; next }
{
endSecs = mktime(gensub(/(..)\/(..)\/(....)/,"\\3 \\2 \\1 0 0 0","",$2))
if (begSecs=="") {
begSecs = endSecs + ((window-1) * secsPerDay)
}
amount[endSecs][$3] += $4
dists[$3]
}
END {
for (currSecs=begSecs; currSecs<=endSecs; currSecs+=secsPerDay) {
for (dayNr=1; dayNr<=window; dayNr++) {
rollSecs = currSecs - ((dayNr-1) * secsPerDay)
for (dist in dists) {
sum[dist] += (rollSecs in amount ? amount[rollSecs][dist] : 0)
}
}
for (dist in dists) {
print strftime("%d/%m/%Y",currSecs), dist, sum[dist]
delete sum[dist]
}
}
}
Run Code Online (Sandbox Code Playgroud)
.
$ awk -f tst.awk file
RollingDate,Distributor,Amount
01/05/2015,xyz456,250
01/05/2015,abc123,450
02/05/2015,xyz456,250
02/05/2015,abc123,450
03/05/2015,xyz456,200
03/05/2015,abc123,450
04/05/2015,xyz456,235
04/05/2015,abc123,130
05/05/2015,xyz456,247
05/05/2015,abc123,130
06/05/2015,xyz456,240
06/05/2015,abc123,162
07/05/2015,xyz456,327
07/05/2015,abc123,137
08/05/2015,xyz456,350
08/05/2015,abc123,145
Run Code Online (Sandbox Code Playgroud)
.
要使用一些不同于7天的窗口大小,只需在命令行上设置它:
$ awk -v window=5 -f tst.awk file
RollingDate,Distributor,Amount
29/04/2015,xyz456,175
29/04/2015,abc123,375
30/04/2015,xyz456,100
30/04/2015,abc123,375
01/05/2015,xyz456,125
01/05/2015,abc123,425
02/05/2015,xyz456,200
02/05/2015,abc123,100
03/05/2015,xyz456,200
03/05/2015,abc123,100
04/05/2015,xyz456,185
04/05/2015,abc123,130
05/05/2015,xyz456,197
05/05/2015,abc123,105
06/05/2015,xyz456,165
06/05/2015,abc123,87
07/05/2015,xyz456,177
07/05/2015,abc123,62
08/05/2015,xyz456,275
08/05/2015,abc123,120
Run Code Online (Sandbox Code Playgroud)
以上使用GNU awk实现真正的2D数组和时间函数.希望很清楚,您可以进行任何修改,以包含/排除特定的日期范围.