按ID分组Mongo文档,并按时间戳获取最新文档

Sha*_*ark 5 mongodb mongodb-query aggregation-framework

想象一下,我们在mongodb中存储了以下一组文档:

{ "fooId" : "1", "status" : "A", "timestamp" : ISODate("2016-01-01T00:00:00.000Z") "otherInfo" : "BAR", ... }
{ "fooId" : "1", "status" : "B", "timestamp" : ISODate("2016-01-02T00:00:00.000Z") "otherInfo" : "BAR", ... }
{ "fooId" : "1", "status" : "C", "timestamp" : ISODate("2016-01-03T00:00:00.000Z") "otherInfo" : "BAR", ... }
{ "fooId" : "2", "status" : "A", "timestamp" : ISODate("2016-01-01T00:00:00.000Z") "otherInfo" : "BAR", ... }
{ "fooId" : "2", "status" : "B", "timestamp" : ISODate("2016-01-02T00:00:00.000Z") "otherInfo" : "BAR", ... }
{ "fooId" : "3", "status" : "A", "timestamp" : ISODate("2016-01-01T00:00:00.000Z") "otherInfo" : "BAR", ... }
{ "fooId" : "3", "status" : "B", "timestamp" : ISODate("2016-01-02T00:00:00.000Z") "otherInfo" : "BAR", ... }
{ "fooId" : "3", "status" : "C", "timestamp" : ISODate("2016-01-03T00:00:00.000Z") "otherInfo" : "BAR", ... }
{ "fooId" : "3", "status" : "D", "timestamp" : ISODate("2016-01-04T00:00:00.000Z") "otherInfo" : "BAR", ... }
Run Code Online (Sandbox Code Playgroud)

我想基于时间戳获取每个fooId的最新状态.因此,我的回报看起来像:

{ "fooId" : "1", "status" : "C", "timestamp" : ISODate("2016-01-03T00:00:00.000Z") "otherInfo" : "BAR", ... }
{ "fooId" : "2", "status" : "B", "timestamp" : ISODate("2016-01-02T00:00:00.000Z") "otherInfo" : "BAR", ... }
{ "fooId" : "3", "status" : "D", "timestamp" : ISODate("2016-01-04T00:00:00.000Z") "otherInfo" : "BAR", ... }
Run Code Online (Sandbox Code Playgroud)

我一直试图通过使用group运算符使用聚合来解决这个问题,但我想知道的部分有一种简单的方法可以从聚合中获取整个文档,所以它看起来像我使用了查找查询一样?看来你必须在分组时指定所有字段,如果文档上可以包含我可能不知道的可选字段,那么这似乎是不可扩展的.我当前的查询看起来像这样:

db.collectionName.aggregate(
   [
     { $sort: { timestamp: 1 } },
     {
       $group:
         {
           _id: "$fooId",
           timestamp: { $last: "$timestamp" },
           status: { "$last": "$status" },
           otherInfo: { "$last": "$otherInfo" },
         }
     }
   ]
)
Run Code Online (Sandbox Code Playgroud)

sty*_*ane 3

您可以将$$ROOT系统变量与$last运算符一起使用来返回最后一个文档。

db.collectionName.aggregate([      
    { "$sort": { "timestamp": 1 } },     
    { "$group": { 
        "_id": "$fooId",   
        "last_doc": { "$last": "$$ROOT" } 
    }}
])
Run Code Online (Sandbox Code Playgroud)

当然,这将作为字段值的每个组的最后一个文档。

{
        "_id" : "2",
        "doc" : {
                "_id" : ObjectId("570e6df92f5bb4fcc8bb177e"),
                "fooId" : "2",
                "status" : "B",
                "timestamp" : ISODate("2016-01-02T00:00:00Z")
        }
}
Run Code Online (Sandbox Code Playgroud)

如果您对该输出不满意,那么最好的选择是$group在使用累加器运算符返回这些文档的数组时向管道添加另一个阶段$push

db.collectionName.aggregate([      
    { "$sort": { "timestamp": 1 } },     
    { "$group": { 
        "_id": "$fooId",   
        "last_doc": { "$last": "$$ROOT" } 
    }},
    { "$group": { 
        "_id": null, 
        "result": { "$push": "$last_doc" } 
    }}

])
Run Code Online (Sandbox Code Playgroud)