mc.*_*mc. 6 mongodb moving-average aggregation-framework
如果您有50年的温度天气数据(例如),您将如何计算该时间段的3个月间隔的移动平均线?你能用一个查询做到这一点,还是必须有多个查询?
Example Data
01/01/2014 = 40 degrees
12/31/2013 = 38 degrees
12/30/2013 = 29 degrees
12/29/2013 = 31 degrees
12/28/2013 = 34 degrees
12/27/2013 = 36 degrees
12/26/2013 = 38 degrees
.....
Run Code Online (Sandbox Code Playgroud)
该AGG框架现在已经$map和$reduce和$range内置,阵列处理得多straightfoward。下面是计算一组数据的移动平均值的示例,您希望在其中按某些谓词过滤。基本设置是每个文档都包含可过滤的标准和一个值,例如
{sym: "A", d: ISODate("2018-01-01"), val: 10}
{sym: "A", d: ISODate("2018-01-02"), val: 30}
Run Code Online (Sandbox Code Playgroud)
这里是:
// This controls the number of observations in the moving average:
days = 4;
c=db.foo.aggregate([
// Filter down to what you want. This can be anything or nothing at all.
{$match: {"sym": "S1"}}
// Ensure dates are going earliest to latest:
,{$sort: {d:1}}
// Turn docs into a single doc with a big vector of observations, e.g.
// {sym: "A", d: d1, val: 10}
// {sym: "A", d: d2, val: 11}
// {sym: "A", d: d3, val: 13}
// becomes
// {_id: "A", prx: [ {v:10,d:d1}, {v:11,d:d2}, {v:13,d:d3} ] }
//
// This will set us up to take advantage of array processing functions!
,{$group: {_id: "$sym", prx: {$push: {v:"$val",d:"$date"}} }}
// Nice additional info. Note use of dot notation on array to get
// just scalar date at elem 0, not the object {v:val,d:date}:
,{$addFields: {numDays: days, startDate: {$arrayElemAt: [ "$prx.d", 0 ]}} }
// The Juice! Assume we have a variable "days" which is the desired number
// of days of moving average.
// The complex expression below does this in python pseudocode:
//
// for z in range(0, size of value vector - # of days in moving avg):
// seg = vector[n:n+days]
// values = seg.v
// dates = seg.d
// for v in seg:
// tot += v
// avg = tot/len(seg)
//
// Note that it is possible to overrun the segment at the end of the "walk"
// along the vector, i.e. not enough date-values. So we only run the
// vector to (len(vector) - (days-1).
// Also, for extra info, we also add the number of days *actually* used in the
// calculation AND the as-of date which is the tail date of the segment!
//
// Again we take advantage of dot notation to turn the vector of
// object {v:val, d:date} into two vectors of simple scalars [v1,v2,...]
// and [d1,d2,...] with $prx.v and $prx.d
//
,{$addFields: {"prx": {$map: {
input: {$range:[0,{$subtract:[{$size:"$prx"}, (days-1)]}]} ,
as: "z",
in: {
avg: {$avg: {$slice: [ "$prx.v", "$$z", days ] } },
d: {$arrayElemAt: [ "$prx.d", {$add: ["$$z", (days-1)] } ]}
}
}}
}}
]);
Run Code Online (Sandbox Code Playgroud)
这可能会产生以下输出:
{
"_id" : "S1",
"prx" : [
{
"avg" : 11.738793632512115,
"d" : ISODate("2018-09-05T16:10:30.259Z")
},
{
"avg" : 12.420766702631376,
"d" : ISODate("2018-09-06T16:10:30.259Z")
},
...
],
"numDays" : 4,
"startDate" : ISODate("2018-09-02T16:10:30.259Z")
}
Run Code Online (Sandbox Code Playgroud)
我倾向于在MongoDB中执行此操作的方式是在文档中为每天的值维护过去90天的运行总和,例如
{"day": 1, "tempMax": 40, "tempMaxSum90": 2232}
{"day": 2, "tempMax": 38, "tempMaxSum90": 2230}
{"day": 3, "tempMax": 36, "tempMaxSum90": 2231}
{"day": 4, "tempMax": 37, "tempMaxSum90": 2233}
Run Code Online (Sandbox Code Playgroud)
每当需要将新数据点添加到集合中时,您可以使用两个简单查询(一个加法和一个减法)(psuedo-code)有效地计算下一个和,而不是读取和求和90个值.
tempMaxSum90(day) = tempMaxSum90(day-1) + tempMax(day) - tempMax(day-90)
Run Code Online (Sandbox Code Playgroud)
每天的90天移动平均值仅为90天的总和除以90.
如果您还想提供不同时间尺度的移动平均线(例如1周,30天,90天,1年),您可以简单地维护每个文档的一系列总和而不是一个总和,每次一个总和 - 要求的规模.
这种方法需要额外的存储空间和额外的处理来插入新数据,但是在大多数时间序列图表方案中是合适的,其中新数据的收集相对缓慢且需要快速检索.
| 归档时间: |
|
| 查看次数: |
1928 次 |
| 最近记录: |