Mongodb聚合框架| 分组多个值?

Oli*_*oyd 26 mongodb aggregation-framework

我想使用mongoDB的聚合框架来运行SQL中看起来有点像:

SELECT SUM(A), B, C from myTable GROUP BY B, C;
Run Code Online (Sandbox Code Playgroud)

文档说明:

您可以从管道中的文档指定单个字段,先前计算的值,或者从多个传入字段组成的聚合键.

但目前还不清楚"几个传入领域的聚合密钥"实际上是什么?

我的数据集有点像这样:

[{ "timeStamp" : 1341834988666, "label" : "sharon", "responseCode" : "200", "value" : 10, "success" : "true"},
{ "timeStamp" : 1341834988676, "label" : "paul", "responseCode" : "200", "value" : 60, "success" : "true"},
{ "timeStamp" : 1341834988686, "label" : "paul", "responseCode" : "404", "value" : 15, "success" : "true"},
{ "timeStamp" : 1341834988696, "label" : "sharon", "responseCode" : "200", "value" : 35, "success" : "false"},
{ "timeStamp" : 1341834988166, "label" : "paul", "responseCode" : "200", "value" : 40, "success" : "true"},
{ "timeStamp" : 1341834988266, "label" : "paul", "responseCode" : "404", "value" : 99, "success" : "false"}]
Run Code Online (Sandbox Code Playgroud)

我的查询如下所示:

resultsCollection.aggregate(
    { $match : { testid : testid} },
    { $skip : alreadyRead },
    { $project : {
            timeStamp : 1 ,
            label : 1,
            responseCode : 1 ,
            value : 1,
            success : 1
        }},
    { $group : {
            _id : "$label",
            max_timeStamp : { $timeStamp : 1 },
            count_responseCode : { $sum : 1 },
            avg_value : { $sum : "$value" },
            count_success : { $sum : 1 }
        }},
    { $group : {
            ?
        }}
);
Run Code Online (Sandbox Code Playgroud)

我的直觉是尝试将结果传递给第二组,我知道你可以做到这一点,但它不会起作用,因为第一组已经过多地减少了数据集,并且所需的细节水平也丢失了.

我想要做的是分组使用label,responseCodesuccess从结果中获取值的总和.看起来应该有点像:

label   | code | success | sum_of_values | count
sharon  | 200  |  true   |      10       |   1
sharon  | 200  |  false  |      35       |   1
paul    | 200  |  true   |      100      |   2
paul    | 404  |  true   |      15       |   1
paul    | 404  |  false  |      99       |   1
Run Code Online (Sandbox Code Playgroud)

哪里有五组:

1. { "timeStamp" : 1341834988666, "label" : "sharon", "responseCode" : "200", "value" : 10, "success" : "true"}

2. { "timeStamp" : 1341834988696, "label" : "sharon", "responseCode" : "200", "value" : 35, "success" : "false"}

3. { "timeStamp" : 1341834988676, "label" : "paul", "responseCode" : "200", "value" : 60, "success" : "true"}
   { "timeStamp" : 1341834988166, "label" : "paul", "responseCode" : "200", "value" : 40, "success" : "true"}

4. { "timeStamp" : 1341834988686, "label" : "paul", "responseCode" : "404", "value" : 15, "success" : "true"}

5. { "timeStamp" : 1341834988266, "label" : "paul", "responseCode" : "404", "value" : 99, "success" : "false"}
Run Code Online (Sandbox Code Playgroud)

Oli*_*oyd 37

好的,所以解决方案是为_id值指定聚合键.这此记录为:

您可以从管道中的文档指定单个字段,先前计算的值,或者从多个传入字段组成的聚合键.

但它实际上并没有定义聚合键的格式.阅读前面的文档,我看到以前的collection.group方法可以采用多个字段,并且在新框架中使用相同的结构.

因此,要对可以使用的多个字段进行分组 _id : { success:'$success', responseCode:'$responseCode', label:'$label'}

如:

resultsCollection.aggregate(
{ $match : { testid : testid} },
{ $skip : alreadyRead },
{ $project : {
        timeStamp : 1 ,
        label : 1,
        responseCode : 1 ,
        value : 1,
        success : 1
    }},
{ $group : {
        _id :  { success:'$success', responseCode:'$responseCode', label:'$label'},
        max_timeStamp : { $timeStamp : 1 },
        count_responseCode : { $sum : 1 },
        avg_value : { $sum : "$value" },
        count_success : { $sum : 1 }
    }}
);
Run Code Online (Sandbox Code Playgroud)

  • 注意:从2.2.0-rc1开始,对多个字段进行分组略有变化:_id:{success:1,responseCode:1,label:1}将变为_id:{success:"$ success",responseCode:" $ responseCode",标签: "$标签"}.https://groups.google.com/forum/?fromgroups#!topic/mongodb-user/1cYch580h0w (9认同)