“过滤然后聚合”还是“过滤聚合”？

Question

“过滤然后聚合”还是“过滤聚合”？

Hea*_*ren 1 elasticsearch elasticsearch-aggregation

我最近在研究 ES，我发现我可以达到几乎相同的结果，但我不清楚这两者之间的区别。

POST kibana_sample_data_flights/_search
{
  "size": 0,
  "query": {
    "constant_score": {
      "filter": {
        "term": {
          "DestCountry": "CA"
        }
      }
    }
  },
  "aggs": {
    "ca_weathers": {
      "terms": { "field": "DestWeather" }
    }
  }
}

Run Code Online (Sandbox Code Playgroud)

“过滤聚合”

POST kibana_sample_data_flights/_search
{
  "size": 0,
  "aggs": {
    "ca": {
      "filter": {
        "term": {
          "DestCountry": "CA"
        }
      },
      "aggs": {
        "_weathers": {
           "terms": { "field": "DestWeather" } 
        }
      }
    }
  }
}

Run Code Online (Sandbox Code Playgroud)

我的问题

为什么有两个相似的函数？我相信我错了，但有什么区别呢？ _{（请忽略结果格式，这不是我要问的问题；p）}
如果我想过滤掉不相关/不匹配的文件并开始对大量文档进行聚合，哪个更好？

Answer 1

Kev*_*zel 5

当您在中使用它时"query"，您将在索引中的所有文档上创建上下文。在这种情况下，它的作用就像像一个正常的过滤器：SELECT * FROM index WHERE (my_filter_condition1 AND my_filter_condition2 OR my_filter_condition3...)。

当您在中使用它时"aggs"，您正在为之前可能已（或尚未）过滤的所有文档创建上下文。假设您有如下结构：

#OPTION A
{
    "aggs":{
        t_shirts" : {
            "filter" : { "term": { "type": "t-shirt" } }
        }
    }
}

Run Code Online (Sandbox Code Playgroud)

没有“查询”，与拥有完全相同

#OPTION B
{
    "query":{
        "filter" : { "term": { "type": "t-shirt" } }
    }
}

Run Code Online (Sandbox Code Playgroud)

但是结果将在不同的字段中返回。

在选项 A 中，结果将在aggregations字段中返回。

在选项 B 中，结果将在hits字段中返回。

我建议始终在query部件上应用您的过滤器，以便您可以使用已过滤文档的后续聚合。也因为聚合比查询花费更多的性能。

希望这是有帮助的！:D

在选项 A 中，聚合将在所有文档上运行。在选项 B 中，首先过滤文档，然后仅对选定的文档运行聚合。假设您有 10M 个文档，而过滤器仅选择 100 个，很明显选项 B 总是更快。 (3认同)

Answer 2

Hea*_*ren 0

@Val的评论的回答，我可以引用这里供参考：

在选项 A 中，聚合将在所有文档上运行。在选项 B 中，首先过滤文档，然后仅对选定的文档运行聚合。假设您有 10M 个文档，而过滤器仅选择 100 个，很明显选项 B 总是更快。

归档时间：	6 年，8 月前
查看次数：	1343 次
最近记录：	6 年，6 月前