ElasticSearch:过滤聚合 top_hits

Sti*_*tne 5 elasticsearch

我有一组文章,其中一篇文章是几篇文章的组合。在 ES 中,一个帖子就是一个文档。每个帖子都有一个 postId、articleId、时间戳和一个状态(简化版)。文章的状态是已记录的同一文章中最后一个帖子的状态。我想查询具有特定状态的文章,并且只返回 articleId 作为结果。这意味着我必须对 articleId 进行分组,按时间戳排序,最后按状态过滤结果。

我已经设法进行分组和排序,但我有点卡在最后一部分。

我们的数据看起来有点像这样:

postid  articleId   timestamp               status

1       1           01.01.2016 00:00:01     Success
2       1           01.01.2016 00:00:03     Success
3       1           01.01.2016 00:00:02     Error

4       2           01.01.2016 00:00:01     Success
5       2           01.01.2016 00:00:03     Error
6       2           01.01.2016 00:00:02     Success
Run Code Online (Sandbox Code Playgroud)

通过我当前的查询,我得到了这个:

articeid    latestStatus

1           Success
2           Error
Run Code Online (Sandbox Code Playgroud)

我想编写一个查询,其中我要求(例如)所有状态为 Error 的文章的 articleId。此查询应返回:

articeid

2
Run Code Online (Sandbox Code Playgroud)

这是我到目前为止得到的:

GET /_search
{    
    "size": 0,     
    "aggs": {
        "message_status": {
            "terms": {
                "field": "articleId"
            },            
            "aggs": {
                "group_docs": {
                    "top_hits": {
                        "size": 1,
                        "sort": [
                            {
                                "processed": {
                                    "order": "desc"
                                }
                            }
                        ]
                    }
                }
            }            
        }
    }
}
Run Code Online (Sandbox Code Playgroud)

我曾尝试将 post_filter 和 bucket_selector 与脚本一起使用,但无法使其正常工作。

上面的查询返回:

{
   "took": 6,
   "timed_out": false,
   "_shards": {
      "total": 5,
      "successful": 5,
      "failed": 0
   },
   "hits": {
      "total": 6,
      "max_score": 0,
      "hits": []
   },
   "aggregations": {
      "message_status": {
         "doc_count_error_upper_bound": 0,
         "sum_other_doc_count": 0,
         "buckets": [
            {
               "key": "1337",
               "doc_count": 3,
               "group_docs": {
                  "hits": {
                     "total": 3,
                     "max_score": null,
                     "hits": [
                        {
                           "_index": "article",
                           "_type": "post",
                           "_id": "3",
                           "_score": null,
                           "_source": {
                              "postId": 3,
                              "articleId": "1337",
                              "processed": "2016-10-10T12:47:25.570852+02:00",
                              "statusId": 6
                           },
                           "sort": [
                              1476096445570
                           ]
                        }
                     ]
                  }
               }
            },
            {
               "key": "42",
               "doc_count": 3,
               "group_docs": {
                  "hits": {
                     "total": 3,
                     "max_score": null,
                     "hits": [
                        {
                           "_index": "article",
                           "_type": "post",
                           "_id": "6",
                           "_score": null,
                           "_source": {
                              "postId": 6,
                              "articleId": "42",
                              "processed": "2016-10-10T13:02:59.399726+02:00",
                              "statusId": 5
                           },
                           "sort": [
                              1476097379399
                           ]
                        }
                     ]
                  }
               }
            }
         ]
      }
   }
}
Run Code Online (Sandbox Code Playgroud)

我现在想要实现的是在特定的 statusId 上过滤此响应并仅返回 articleIds。

我非常感谢所有帮助!

更新:

这是我的映射

{
   "article": {
      "mappings": {
         "post": {
            "properties": {               
               "articleId": {
                  "type": "string"
               },              
               "postId": {
                  "type": "integer"
               },
               "processed": {
                  "type": "date",
                  "format": "strict_date_optional_time||epoch_millis"
               },
               "statusId": {
                  "type": "integer"
               }
            }
         }
      }
   }
}
Run Code Online (Sandbox Code Playgroud)

Ric*_*cha 0

请尝试以下查询:

GET article/_search
{
"size": 0,
"query": {
  "term": {
     "status": {
        "value": "error"
     }
   }
 },
 "aggs": {
  "group By articls": {
     "terms": {
        "field": "articleId"
     },
     "aggs": {
        "top hits": {
           "top_hits": {
              "size": 1,
              "_source" :["articleId"],
              "sort": [
                 {
                    "timestamp": {
                       "order": "desc"
                    }
                 }
              ]
           }
        }
     }
    }
  }
}
Run Code Online (Sandbox Code Playgroud)

结果会是这样的:

 "buckets": [
        {
           "key": 2,
           "doc_count": 1,
           "top hits": {
              "hits": {
                 "total": 1,
                 "max_score": null,
                 "hits": [
                    {
                       "_index": "article",
                       "_type": "article",
                       "_id": "3",
                       "_score": null,
                       "_source": {
                          "articleId": 2
                       },
                       "sort": [
                          1444435200000
                       ]
                    }
                 ]
              }
           }
        }
     ]
Run Code Online (Sandbox Code Playgroud)

希望这可以帮助!!