wol*_*tat 7 mongodb aggregation-framework
给定以下数据集:
{ "_id" : 1, "city" : "Yuma", "cat": "roads", "Q1" : 0, "Q2" : 25, "Q3" : 0, "Q4" : 0 }
{ "_id" : 2, "city" : "Reno", "cat": "roads", "Q1" : 30, "Q2" : 0, "Q3" : 0, "Q4" : 60 }
{ "_id" : 3, "city" : "Yuma", "cat": "parks", "Q1" : 0, "Q2" : 0, "Q3" : 45, "Q4" : 0 }
{ "_id" : 4, "city" : "Reno", "cat": "parks", "Q1" : 35, "Q2" : 0, "Q3" : 0, "Q4" : 0 }
{ "_id" : 5, "city" : "Yuma", "cat": "roads", "Q1" : 0, "Q2" : 15, "Q3" : 0, "Q4" : 20 }
Run Code Online (Sandbox Code Playgroud)
我正在努力实现以下结果。最好只返回大于零的总数,并将每个城市、猫和 Qx 总数压缩到一个记录。
{
"city" : "Yuma",
"cat" : "roads",
"Q2total" : 40
},
{
"city" : "Reno",
"cat" : "roads",
"Q1total" : 30
},
{
"city" : "Reno",
"cat" : "roads",
"Q4total" : 60
},
{
"city" : "Yuma",
"cat" : "parks",
"Q3total" : 45
},
{
"city" : "Reno",
"cat" : "parks",
"Q1total" : 35
},
{
"city" : "Yuma",
"cat" : "roads",
"Q4total" : 20
}
Run Code Online (Sandbox Code Playgroud)
可能的?
小智 6
我们不禁要问,这样做的目的是什么?你的文档已经有一个很好的一致的对象结构,这是推荐的。让对象具有不同的键并不是一个好主意。数据是“数据”,不应该是键的名称。
考虑到这一点,聚合框架实际上遵循了这种意义,并且不允许从文档中包含的数据生成任意键名称。但是,您可以将输出作为数据点得到类似的结果:
db.junk.aggregate([
// Aggregate first to reduce the pipeline documents somewhat
{ "$group": {
"_id": {
"city": "$city",
"cat": "$cat"
},
"Q1": { "$sum": "$Q1" },
"Q2": { "$sum": "$Q2" },
"Q3": { "$sum": "$Q3" },
"Q4": { "$sum": "$Q4" }
}},
// Convert the "quarter" elements to array entries with the same keys
{ "$project": {
"totals": {
"$map": {
"input": { "$literal": [ "Q1", "Q2", "Q3", "Q4" ] },
"as": "el",
"in": { "$cond": [
{ "$eq": [ "$$el", "Q1" ] },
{ "quarter": "$$el", "total": "$Q1" },
{ "$cond": [
{ "$eq": [ "$$el", "Q2" ] },
{ "quarter": "$$el", "total": "$Q2" },
{ "$cond": [
{ "$eq": [ "$$el", "Q3" ] },
{ "quarter": "$$el", "total": "$Q3" },
{ "quarter": "$$el", "total": "$Q4" }
]}
]}
]}
}
}
}},
// Unwind the array produced
{ "$unwind": "$totals" },
// Filter any "0" resutls
{ "$match": { "totals.total": { "$ne": 0 } } },
// Maybe project a prettier "flatter" output
{ "$project": {
"_id": 0,
"city": "$_id.city",
"cat": "$_id.cat",
"quarter": "$totals.quarter",
"total": "$totals.total"
}}
])
Run Code Online (Sandbox Code Playgroud)
这会给你这样的结果:
{ "city" : "Reno", "cat" : "parks", "quarter" : "Q1", "total" : 35 }
{ "city" : "Yuma", "cat" : "parks", "quarter" : "Q3", "total" : 45 }
{ "city" : "Reno", "cat" : "roads", "quarter" : "Q1", "total" : 30 }
{ "city" : "Reno", "cat" : "roads", "quarter" : "Q4", "total" : 60 }
{ "city" : "Yuma", "cat" : "roads", "quarter" : "Q2", "total" : 40 }
{ "city" : "Yuma", "cat" : "roads", "quarter" : "Q4", "total" : 20 }
Run Code Online (Sandbox Code Playgroud)
您也可以使用 mapReduce,它允许键名称具有“一些”灵活性。问题是,您的聚合仍然是按“季度”进行的,因此您需要将其作为主键的一部分,一旦发出就无法更改。
此外,在输出到集合后,如果没有第二次传递,您就无法“过滤”任何“0”的聚合结果,因此它对于您想要做的事情并没有多大用处,除非您可以接受“transform”的第二个mapReduce操作" 对输出集合的查询。
值得注意的是,如果您查看此处的“第二”管道阶段正在执行的操作$project,$map您会发现文档结构本质上已被更改为类似于您可以像最初一样构建文档的结构,如下所示:
{
"city" : "Reno",
"cat" : "parks"
"totals" : [
{ "quarter" : "Q1", "total" : 35 },
{ "quarter" : "Q2", "total" : 0 },
{ "quarter" : "Q3", "total" : 0 },
{ "quarter" : "Q4", "total" : 0 }
]
},
{
"city" : "Yuma",
"cat" : "parks"
"totals" : [
{ "quarter" : "Q1", "total" : 0 },
{ "quarter" : "Q2", "total" : 0 },
{ "quarter" : "Q3", "total" : 45 },
{ "quarter" : "Q4", "total" : 0 }
]
}
Run Code Online (Sandbox Code Playgroud)
那么聚合操作就变得简单了,你的文档就可以得到如上所示的相同结果:
db.collection.aggregate([
{ "$unwind": "$totals" },
{ "$group": {
"_id": {
"city": "$city",
"cat": "$cat",
"quarter": "$totals.quarter"
},
"ttotal": { "$sum": "$totals.total" }
}},
{ "$match": { "ttotal": { "$ne": 0 } },
{ "$project": {
"_id": 0,
"city": "$_id.city",
"cat": "$_id.cat",
"quarter": "$_id.quarter",
"total": "$ttotal"
}}
])
Run Code Online (Sandbox Code Playgroud)
因此,一开始就考虑以这种方式构建文档并避免文档转换所需的任何开销可能更有意义。
我认为您会发现一致的键名称可以构建更好的编程对象模型,您应该从键值而不是键名称读取数据点。如果您确实需要,那么只需从对象读取数据并在后处理中转换每个已聚合结果的键即可。
| 归档时间: |
|
| 查看次数: |
9096 次 |
| 最近记录: |