我有一组文章,其中一篇文章是几篇文章的组合。在 ES 中,一个帖子就是一个文档。每个帖子都有一个 postId、articleId、时间戳和一个状态(简化版)。文章的状态是已记录的同一文章中最后一个帖子的状态。我想查询具有特定状态的文章,并且只返回 articleId 作为结果。这意味着我必须对 articleId 进行分组,按时间戳排序,最后按状态过滤结果。
我已经设法进行分组和排序,但我有点卡在最后一部分。
我们的数据看起来有点像这样:
postid articleId timestamp status
1 1 01.01.2016 00:00:01 Success
2 1 01.01.2016 00:00:03 Success
3 1 01.01.2016 00:00:02 Error
4 2 01.01.2016 00:00:01 Success
5 2 01.01.2016 00:00:03 Error
6 2 01.01.2016 00:00:02 Success
Run Code Online (Sandbox Code Playgroud)
通过我当前的查询,我得到了这个:
articeid latestStatus
1 Success
2 Error
Run Code Online (Sandbox Code Playgroud)
我想编写一个查询,其中我要求(例如)所有状态为 Error 的文章的 articleId。此查询应返回:
articeid
2
Run Code Online (Sandbox Code Playgroud)
这是我到目前为止得到的:
GET /_search
{
"size": 0,
"aggs": {
"message_status": {
"terms": {
"field": "articleId"
},
"aggs": {
"group_docs": {
"top_hits": {
"size": 1,
"sort": [
{
"processed": {
"order": "desc"
}
}
]
}
}
}
}
}
}
Run Code Online (Sandbox Code Playgroud)
我曾尝试将 post_filter 和 bucket_selector 与脚本一起使用,但无法使其正常工作。
上面的查询返回:
{
"took": 6,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 6,
"max_score": 0,
"hits": []
},
"aggregations": {
"message_status": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "1337",
"doc_count": 3,
"group_docs": {
"hits": {
"total": 3,
"max_score": null,
"hits": [
{
"_index": "article",
"_type": "post",
"_id": "3",
"_score": null,
"_source": {
"postId": 3,
"articleId": "1337",
"processed": "2016-10-10T12:47:25.570852+02:00",
"statusId": 6
},
"sort": [
1476096445570
]
}
]
}
}
},
{
"key": "42",
"doc_count": 3,
"group_docs": {
"hits": {
"total": 3,
"max_score": null,
"hits": [
{
"_index": "article",
"_type": "post",
"_id": "6",
"_score": null,
"_source": {
"postId": 6,
"articleId": "42",
"processed": "2016-10-10T13:02:59.399726+02:00",
"statusId": 5
},
"sort": [
1476097379399
]
}
]
}
}
}
]
}
}
}
Run Code Online (Sandbox Code Playgroud)
我现在想要实现的是在特定的 statusId 上过滤此响应并仅返回 articleIds。
我非常感谢所有帮助!
更新:
这是我的映射
{
"article": {
"mappings": {
"post": {
"properties": {
"articleId": {
"type": "string"
},
"postId": {
"type": "integer"
},
"processed": {
"type": "date",
"format": "strict_date_optional_time||epoch_millis"
},
"statusId": {
"type": "integer"
}
}
}
}
}
}
Run Code Online (Sandbox Code Playgroud)
请尝试以下查询:
GET article/_search
{
"size": 0,
"query": {
"term": {
"status": {
"value": "error"
}
}
},
"aggs": {
"group By articls": {
"terms": {
"field": "articleId"
},
"aggs": {
"top hits": {
"top_hits": {
"size": 1,
"_source" :["articleId"],
"sort": [
{
"timestamp": {
"order": "desc"
}
}
]
}
}
}
}
}
}
Run Code Online (Sandbox Code Playgroud)
结果会是这样的:
"buckets": [
{
"key": 2,
"doc_count": 1,
"top hits": {
"hits": {
"total": 1,
"max_score": null,
"hits": [
{
"_index": "article",
"_type": "article",
"_id": "3",
"_score": null,
"_source": {
"articleId": 2
},
"sort": [
1444435200000
]
}
]
}
}
}
]
Run Code Online (Sandbox Code Playgroud)
希望这可以帮助!!
| 归档时间: |
|
| 查看次数: |
2963 次 |
| 最近记录: |