Mongo Aggregate：重复数组值字段的和计数字段

Question

Mongo Aggregate：重复数组值字段的和计数字段

Tac*_*cat 3 mongodb aggregation-framework

我有一大堆这样的文件：

{
  _id: '1',
  colors: [
    { value: 'red', count: 2 },
    { value: 'blue', count: 3}
  ]
  shapes: [
    { value: 'cube', type: '3d' },
    { value: 'square', type: '2d'}
  ]
},    
{
  _id: '2',
  colors: [
    { value: 'red', count: 7 },
    { value: 'blue', count: 34},
    { value: 'yellow', count: 12}
  ]
  shapes: [
    { value: 'prism', type: '3d' },
    { value: 'triangle', type: '2d'}
  ]
}

Run Code Online (Sandbox Code Playgroud)

通过使用$unwind和$addToSet，如下所示：

db.getCollection('coll').aggregate([{$unwind:"$colors"},{$unwind:"$shapes"},{$group:{_id:null,colors:{$addToSet:"$colors"},shapes:{$addToSet:"$shapes"}])

Run Code Online (Sandbox Code Playgroud)

我可以得到以下信息：

{
    "_id" : null,
    "colors" : [ 
        { "value" : "red", "count" : 2 }, 
        { "value" : "blue", "count" : 3 }, 
        { "value" : "red", "count" : 7 }, 
        { "value" : "blue", "count" : 34 }, 
        { "value" : "yellow", "count" : 12 }
    ]
    "shapes" : [
        { value: 'cube', type: '3d' },
        { value: 'square', type: '2d'}
        { value: 'prism', type: '3d' },
        { value: 'triangle', type: '2d'}
    ]
}

Run Code Online (Sandbox Code Playgroud)

然而，我想要的是仅通过字段“值”来判断重复项，并对重复项的“计数”字段求和，即

{
    "_id" : null,
    "colors" : [ 
        { "value" : "red", "count" : 9 }, 
        { "value" : "blue", "count" : 37 },  
        { "value" : "yellow", "count" : 12 }
    ]
    "shapes" : [
        { value: 'cube', type: '3d' },
        { value: 'square', type: '2d'}
        { value: 'prism', type: '3d' },
        { value: 'triangle', type: '2d'}
    ]
}

Run Code Online (Sandbox Code Playgroud)

这个问题表明我可以使用$colors.value作为一个_id字段$sum并对count. 然而，由于我有第二个数组和$unwindaggregate/ $group，我不确定执行此操作的最佳方法。

Answer 1

chr*_*dam 5

尝试运行以下聚合管道：

pipeline = [
    {"$unwind": "$colors"},
    {
        "$group": {
            "_id": "$colors.value",
            "count": { "$sum": "$colors.count" },
            "shapes": { "$first": "$shapes" }
        }
    },
    {"$unwind": "$shapes"},
    {
        "$group": {
            "_id": null,
            "colors": { 
                "$addToSet": {
                    "value": "$_id",
                    "count": "$count"
                }
            },
            "shapes": { "$addToSet": "$shapes" }            
        }
    }
];
db.getCollection('coll').aggregate(pipeline)

Run Code Online (Sandbox Code Playgroud)

样本输出

{
    "result" : [ 
        {
            "_id" : null,
            "colors" : [ 
                {
                    "value" : "red",
                    "count" : 9
                }, 
                {
                    "value" : "blue",
                    "count" : 37
                }, 
                {
                    "value" : "yellow",
                    "count" : 12
                }
            ],
            "shapes" : [ 
                {
                    "value" : "square",
                    "type" : "2d"
                }, 
                {
                    "value" : "cube",
                    "type" : "3d"
                }, 
                {
                    "value" : "triangle",
                    "type" : "2d"
                }, 
                {
                    "value" : "prism",
                    "type" : "3d"
                }
            ]
        }
    ],
    "ok" : 1
}

Run Code Online (Sandbox Code Playgroud)

请注意，文档的计数值{ value: 'yellow', count: '12'}是一个字符串，在聚合中它将被折扣为 0 值，因为$sum运算符有效地聚合数值，否则字符串值将默认累积为零。

在管道内$group，您现在按字段对展平颜色数组文档进行分组$colors.value，然后使用累加器返回分组文档上所需的聚合。此分组操作中使用了累加器运算符$first，因为当文档按定义的顺序排列时，它会从每个组的第一个文档中返回一个值，在这种情况下，您希望按所有文档分组时的情况返回 shape 字段。维护管道内文档的顺序更像是一个技巧。

这里需要注意的一件事是，在执行管道时，MongoDB 将运算符通过管道相互连接。这里的“Pipe”取Linux的含义：一个运算符的输出成为下一个运算符的输入。每个运算符的结果都是一个新的文档集合。所以 Mongo 执行之前的管道如下：

collection | $unwind | $group | $unwind | $group => result

Run Code Online (Sandbox Code Playgroud)

因此，$first有必要将形状字段从前一个管道获取到下一个管道。

归档时间：	9 年，10 月前
查看次数：	3597 次
最近记录：	8 年，6 月前