mongodb单个操作中的多个聚合

Question

mongodb单个操作中的多个聚合

Poo*_*rna 7 mongodb aggregation-framework

我有一个带有以下文件的物品集合.

{ "item" : "i1", "category" : "c1", "brand" : "b1" }  
{ "item" : "i2", "category" : "c2", "brand" : "b1" }  
{ "item" : "i3", "category" : "c1", "brand" : "b2" }  
{ "item" : "i4", "category" : "c2", "brand" : "b1" }  
{ "item" : "i5", "category" : "c1", "brand" : "b2" }

Run Code Online (Sandbox Code Playgroud)

我想分开汇总结果 - >按类别计算,按品牌计算.请注意,不计入(类别,品牌)

我可以使用map-reduce使用以下代码执行此操作.

map = function(){
    emit({type:"category",category:this.category},1);
    emit({type:"brand",brand:this.brand},1);
}
reduce = function(key, values){
    return Array.sum(values)
}
db.item.mapReduce(map,reduce,{out:{inline:1}})

Run Code Online (Sandbox Code Playgroud)

结果是

{
        "results" : [
                {
                        "_id" : {
                                "type" : "brand",
                                "brand" : "b1"
                        },
                        "value" : 3
                },
                {
                        "_id" : {
                                "type" : "brand",
                                "brand" : "b2"
                        },
                        "value" : 2
                },
                {
                        "_id" : {
                                "type" : "category",
                                "category" : "c1"
                        },
                        "value" : 3
                },
                {
                        "_id" : {
                                "type" : "category",
                                "category" : "c2"
                        },
                        "value" : 2
                }
        ],
        "timeMillis" : 21,
        "counts" : {
                "input" : 5,
                "emit" : 10,
                "reduce" : 4,
                "output" : 4
        },
        "ok" : 1,
}

Run Code Online (Sandbox Code Playgroud)

我可以通过触发两个不同的聚合命令获得相同的结果,如下所示

db.item.aggregate({$group:{_id:"$category",count:{$sum:1}}})
db.item.aggregate({$group:{_id:"$brand",count:{$sum:1}}})

Run Code Online (Sandbox Code Playgroud)

无论如何我可以通过单个聚合命令使用聚合框架来做同样的事情.

我在这里简化了我的情况,但实际上我需要从子文档数组中的字段进行分组.在我放松之后假设上面是结构.

它是一个实时查询(等待响应的人),虽然在较小的数据集上,因此执行时间很重要.

我正在使用MongoDB 2.4.

Answer 1

Xav*_*hot 6

从开始Mongo 3.4，$facet聚合阶段通过在同一组输入文档的单个阶段内处理多个聚合管道，大大简化了这种类型的用例：

// { "item" : "i1", "category" : "c1", "brand" : "b1" }
// { "item" : "i2", "category" : "c2", "brand" : "b1" }
// { "item" : "i3", "category" : "c1", "brand" : "b2" }
// { "item" : "i4", "category" : "c2", "brand" : "b1" }
// { "item" : "i5", "category" : "c1", "brand" : "b2" }
db.collection.aggregate(
  { $facet: {
      categories: [{ $group: { _id: "$category", count: { "$sum": 1 } } }],
      brands:     [{ $group: { _id: "$brand",    count: { "$sum": 1 } } }]
  }}
)
// {
//   "categories" : [
//     { "_id" : "c1", "count" : 3 },
//     { "_id" : "c2", "count" : 2 }
//   ],
//   "brands" : [
//     { "_id" : "b1", "count" : 3 },
//     { "_id" : "b2", "count" : 2 }
//   ]
// }

Run Code Online (Sandbox Code Playgroud)

Answer 2

Nei*_*unn 5

在一个大型数据集中,我会说你当前的mapReduce方法是最好的,因为这种聚合技术不适用于大数据.但可能在一个相当小的尺寸上,它可能就是你需要的:

db.items.aggregate([
    { "$group": {
        "_id": null,
        "categories": { "$push": "$category" },
        "brands": { "$push": "$brand" }
    }},
    { "$project": {
        "_id": {
            "categories": "$categories",
            "brands": "$brands"
        },
        "categories": 1
    }},
    { "$unwind": "$categories" },
    { "$group": {
        "_id": {
            "brands": "$_id.brands",
            "category": "$categories"
        },
        "count": { "$sum": 1 }
    }},
    { "$group": {
        "_id": "$_id.brands",
        "categories": { "$push": {
            "category": "$_id.category",
            "count": "$count"
        }},
    }},
    { "$project": {
        "_id": "$categories",
        "brands": "$_id"
    }},
    { "$unwind": "$brands" },
    { "$group": {
        "_id": {
            "categories": "$_id",
            "brand": "$brands"
        },
        "count": { "$sum": 1 }
    }},
    { "$group": {
        "_id": null,
        "categories": { "$first": "$_id.categories" },
        "brands": { "$push": {
            "brand": "$_id.brand",
            "count": "$count"
        }}
    }}
])

Run Code Online (Sandbox Code Playgroud)

与mapReduce输出不完全相同,你可以投入更多的阶段来改变输出格式,但这应该是可用的:

{
    "_id" : null,
    "categories" : [
            {
                    "category" : "c2",
                    "count" : 2
            },
            {
                    "category" : "c1",
                    "count" : 3
            }
    ],
    "brands" : [
            {
                    "brand" : "b2",
                    "count" : 2
            },
            {
                    "brand" : "b1",
                    "count" : 3
            }
    ]
}

Run Code Online (Sandbox Code Playgroud)

正如您所看到的,这涉及到数组之间的相当多的混乱,以便在同一管道流程中对"类别"或"品牌"的每一组进行分组.我再说一遍,这对于大数据不会很好,但对于像"订单中的项目"这样的东西,它可能会做得很好.

当然正如你所说,你已经有所简化了,所以第一个分组键null要么是要么是其他的要么缩小到要null在早期$match阶段做这种情况,这可能是你想做的.

归档时间：	11 年，8 月前
查看次数：	4461 次
最近记录：	11 年，8 月前