汇总点击量最高的ElasticSearch

Hay*_*ych 6 elasticsearch

我的文档结构如下:

{
   "chefInfo": {
      "id": int,
      "employed": String
      ... Some more recipe information ...
   }
   "recipe": {
      ... Some recipe information ...
   }
}
Run Code Online (Sandbox Code Playgroud)

如果厨师有多个食谱,则chefInfo每个文档中的嵌套块将相同。我的问题是我想对chefInfo文档部分中的字段进行汇总。但是,这并未考虑到该chefInfo块是重复块这一事实。

因此,如果ID为1的厨师有5种食谱,而我正在employed现场进行汇总,则这位特定的厨师将代表汇总中的5个计数,而我希望他们只对一个计数。

我考虑过top_hits对Chef_id 进行汇总,然后想对所有存储桶进行子聚合,但是我无法弄清楚如何对所有存储桶的结果进行计数。

我可能想做什么?

Nis*_*ini 5

For elastic every document in itself is unique. In your case you want to define uniqueness based on a different field, here chefInfo.id. To find unique count based on this field you have to make use of cardinality aggregation.

You can apply the aggregation as below:

{
  "aggs": {
    "employed": {
      "nested": {
        "path": "chefInfo"
      },
      "aggs": {
        "employed": {
          "terms": {
            "field": "chefInfo.employed.keyword"
          },
          "aggs": {
            "employed_unique": {
              "cardinality": {
                "field": "chefInfo.id"
              }
            }
          }
        }
      }
    }
  }
}
Run Code Online (Sandbox Code Playgroud)

In the result employed_unique give you the expected count.

  • 还应知道,如果`chefInfo.id`具有较高的基数,则[计数可能是近似值]( Aggregation.html#_counts_are_approximate) (2认同)