use*_*131 5 grouping filter aggregation elasticsearch
我有这样的数据:
Id GroupId 更新日期
1 1 2013-11-15T12:00:00
2 1 2013-11-20T12:00:00
3 2 2013-12-01T12:00:00
4 2 2013-13-01T10205:
3 -11-01T12:00:00
6 3 2013-10-01T12:00:00
我如何编写查询以将过滤/分组的列表返回到每个组的最大 UpdateDate?最终列表按 UpdateDate 降序排序。
我希望这个输出:
Id GroupId
更新日期 4 2 2013-13-01T12:00:00
2 1 2013-11-20T12:00:00
6 3 2013-10-01T12:00:00
谢谢你 :)
是的,这可以通过 Elasticsearch 实现,但数据将采用 JSON 格式,需要按照上面显示的格式进行展平。这是我使用 Marvel Sense 的方法
批量加载数据:
POST myindex/mytype/_bulk
{"index":{}}
{"id":1,"GroupId":1,"UpdateDate":"2013-11-15T12:00:00"}
{"index":{}}
{"id":2,"GroupId":1,"UpdateDate":"2013-11-20T12:00:00"}
{"index":{}}
{"id":3,"GroupId":2,"UpdateDate":"2013-12-01T12:00:00"}
{"index":{}}
{"id":4,"GroupId":2,"UpdateDate":"2013-12-01T12:00:00"}
{"index":{}}
{"id":5,"GroupId":2,"UpdateDate":"2013-11-01T12:00:00"}
{"index":{}}
{"id":6,"GroupId":3,"UpdateDate":"2013-10-01T12:00:00"}
Run Code Online (Sandbox Code Playgroud)
按组获取最大值:
GET myindex/mytype/_search?search_type=count
{
"aggs": {
"NAME": {
"terms": {
"field": "GroupId"
},
"aggs": {
"NAME": {
"max": {
"field": "UpdateDate"
}
}
}
}
}
}
Run Code Online (Sandbox Code Playgroud)
输出:
{
...
"aggregations": {
"NAME": {
"buckets": [
{
"key": 2,
"doc_count": 3,
"NAME": {
"value": 1385899200000
}
},
{
"key": 1,
"doc_count": 2,
"NAME": {
"value": 1384948800000
}
},
{
"key": 3,
"doc_count": 1,
"NAME": {
"value": 1380628800000
}
}
]
}
}
...
}
Run Code Online (Sandbox Code Playgroud)
最大日期作为 Linux 时间返回,需要转换回可读的日期格式。