计算嵌入文档/数组中字段的平均值

ret*_*guy 6 average mongodb mongodb-query aggregation-framework

我想用数组评级中的评级字段计算此对象的rating_average字段.你能帮我理解如何使用$ avg进行聚合吗?

{
    "title": "The Hobbit",
    "rating_average": "???",
    "ratings": [
        {
            "title": "best book ever",
            "rating": 5
        },
        {
            "title": "good book",
            "rating": 3.5
        }
    ]
}
Run Code Online (Sandbox Code Playgroud)

chr*_*dam 11

MongoDB 3.4及更新版本中的聚合框架$reduce操作员提供了有效计算总数而无需额外管道的功能.考虑使用它作为表达式来返回总评级并获得使用的评级数$size.加上$addFields,平均因此,可以使用算术运算器来计算$divide与式average = total ratings/number of ratings:

db.collection.aggregate([
    { 
        "$addFields": { 
            "rating_average": {
                "$divide": [
                    { // expression returns total
                        "$reduce": {
                            "input": "$ratings",
                            "initialValue": 0,
                            "in": { "$add": ["$$value", "$$this.rating"] }
                        }
                    },
                    { // expression returns ratings count
                        "$cond": [
                            { "$ne": [ { "$size": "$ratings" }, 0 ] },
                            { "$size": "$ratings" }, 
                            1
                        ]
                    }
                ]
            }
        }
    }           
])
Run Code Online (Sandbox Code Playgroud)

样本输出

{
    "_id" : ObjectId("58ab48556da32ab5198623f4"),
    "title" : "The Hobbit",
    "ratings" : [ 
        {
            "title" : "best book ever",
            "rating" : 5.0
        }, 
        {
            "title" : "good book",
            "rating" : 3.5
        }
    ],
    "rating_average" : 4.25
}
Run Code Online (Sandbox Code Playgroud)

对于旧版本,您需要首先$unwindratings数组字段上应用运算符作为初始聚合管道步骤.这将从ratings输入文档解构数组字段,以输出每个元素的文档.每个输出文档都使用元素值替换数组.

第二个管道阶段是$group操作员,它通过_idtitle键标识符表达式对输入文档进行分组,并将所需的$avg累加器表达式应用于计算平均值的每个组.还有另一个累加器运算符$push通过返回将表达式应用于上述组中的每个文档而得到的所有值的数组来保留原始评级数组字段.

最后的管道步骤是$project操作员,然后重新整形流中的每个文档,例如通过添加新字段ratings_average.

因此,例如,如果您的集合中有一个示例文档(从上面开始,如下所示):

db.collection.insert({
    "title": "The Hobbit",

    "ratings": [
        {
            "title": "best book ever",
            "rating": 5
        },
        {
            "title": "good book",
            "rating": 3.5
        }
    ]
})
Run Code Online (Sandbox Code Playgroud)

要计算评级数组平均值并将值投影到另一个字段中ratings_average,您可以应用以下聚合管道:

db.collection.aggregate([
    {
        "$unwind": "$ratings"
    },
    {
        "$group": {
            "_id": {
                "_id": "$_id",
                "title": "$title"
            },
            "ratings":{
                "$push": "$ratings"
            },
            "ratings_average": {
                "$avg": "$ratings.rating"
            }
        }
    },
    {
        "$project": {
            "_id": 0,
            "title": "$_id.title",
            "ratings_average": 1,
            "ratings": 1
        }
    }
])
Run Code Online (Sandbox Code Playgroud)

结果:

/* 1 */
{
    "result" : [ 
        {
            "ratings" : [ 
                {
                    "title" : "best book ever",
                    "rating" : 5
                }, 
                {
                    "title" : "good book",
                    "rating" : 3.5
                }
            ],
            "ratings_average" : 4.25,
            "title" : "The Hobbit"
        }
    ],
    "ok" : 1
}
Run Code Online (Sandbox Code Playgroud)


Nei*_*unn 5

This really could be written so much shorter, and this was even true at the time of writing. If you want an "average" simply use $avg:

db.collection.aggregate([
  { "$addFields": {
    "rating_average": { "$avg": "$ratings.rating" }
  }}
])
Run Code Online (Sandbox Code Playgroud)

The reason for this is that as of MongoDB 3.2 the $avg operator gained "two" things:

  1. The ability to process an "array" of arguments in a "expression" form rather than solely as an accumulator to $group

  2. Benefits from the features of MongoDB 3.2 that allowed the "shorthand" notation of array expressions. Being either in composition:

    { "array": [ "$fielda", "$fieldb" ] }
    
    Run Code Online (Sandbox Code Playgroud)

    or in notating a single property from the array as an array of the values of that property:

    { "$avg": "$ratings.rating" } // equal to { "$avg": [ 5, 3.5 ] }
    
    Run Code Online (Sandbox Code Playgroud)

In earlier releases you would have to use $map in order to access the "rating" property inside each array element. Now you don't.


For the record, even the $reduce usage can be simplified:

db.collection.aggregate([
  { "$addFields": {
    "rating_average": {
      "$reduce": {
        "input": "$ratings",
        "initialValue": 0,
        "in": {
          "$add": [ 
            "$$value",
            { "$divide": [ 
              "$$this.rating", 
              { "$size": { "$ifNull": [ "$ratings", [] ] } }
            ]}
          ]
        }
      }
    }
  }}
])
Run Code Online (Sandbox Code Playgroud)

Yes as stated, this is really just re-implementing the existing $avg functionality, and therefore since that operator is available then it is the one that should be used.