Elasticsearch 具有层级类别、子类别的聚合;限制水平

Mai*_*din 3 elasticsearch elastic-stack

我有带有类别字段的产品。使用聚合,我可以获得包含所有子类别的完整类别。我想限制方面的水平。

例如我有这样的方面:

auto, tools & travel    (115)
auto, tools & travel > luggage tags (90)
auto, tools & travel > luggage tags > luggage spotters  (40)
auto, tools & travel > luggage tags > something else    (50)
auto, tools & travel > car organizers   (25)
Run Code Online (Sandbox Code Playgroud)

使用像这样的聚合

"aggs": {
    "cat_groups": {
      "terms": {
        "field": "categories.keyword",
        "size": 10,
       "include": "auto, tools & travel > .*"
      }
    }
}
Run Code Online (Sandbox Code Playgroud)

我得到像这样的水桶

"buckets": [
        {
          "auto, tools & travel > luggage tags",
          "doc_count": 90
        },
        {
          "key": "auto, tools & travel > luggage tags > luggage spotters",
          "doc_count": 40
        },
        {
          "key": "auto, tools & travel > luggage tags > something else",
          "doc_count": 50
        },
        {
          "key": "auto, tools & travel > car organizers",
          "doc_count": 25
        }
]
Run Code Online (Sandbox Code Playgroud)

但我想限制水平。例如,我只想获得 的结果auto, tools & travel > luggage tags。如何限制级别?顺便说一句,"exclude": ".* > .* > .*" 这对我不起作用。

我需要根据搜索获取不同级别的存储桶。有时是第一级,有时是第二级或第三级。当我想要第一个级别时,我不希望第二个级别出现在桶上;其他级别依此类推。

Elasticsearch 版本 6.4

Kam*_*mal 6

最后我已经能够弄清楚以下技术。

我已经实现了一个custom analyzer使用Path Hierarchy Tokenizer 的方法,并且创建了名为的多字段categories,以便您可以用于categories.facets聚合/方面并使用categories.

自定义分析器仅适用于categories.facets

"fielddata": "true"请注意我的字段的属性categories.facet

测绘

PUT myindex
{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_analyzer": {
          "tokenizer": "my_tokenizer"
        }
      },
      "tokenizer": {
        "my_tokenizer": {
          "type": "path_hierarchy",
          "delimiter": ">"
        }
      }
    }
  },
  "mappings": {
    "mydocs": {
      "properties": {
        "categories": {
          "type": "text",
          "fields": {
            "facet": { 
              "type":  "text",
              "analyzer": "my_analyzer",
              "fielddata": "true"
            }
          }
        }
      }
    }
  }
}
Run Code Online (Sandbox Code Playgroud)

样本文件

POST myindex/mydocs/1
{
    "categories" : "auto, tools & travel > luggage tags > luggage spotters"
}

POST myindex/mydocs/2
{
    "categories" : "auto, tools & travel > luggage tags > luggage spotters"
}

POST myindex/mydocs/3
{
    "categories" : "auto, tools & travel > luggage tags > luggage spotters"
}

POST myindex/mydocs/4
{
    "categories" : "auto, tools & travel > luggage tags > something else"
}
Run Code Online (Sandbox Code Playgroud)

询问

您可以尝试以下您正在寻找的查询。我再次实现了过滤器聚合,因为您只需要特定的单词以及术语聚合

{
  "size": 0,
  "aggs":{
    "facets": {
      "filter": { 
          "bool": {
            "must": [
              { "match": { "categories": "luggage"} }
            ]
         }
      },
      "aggs": {
        "categories": {
          "terms": {
            "field": "categories.facet"
          }
        }
      }
    }
  }
}
Run Code Online (Sandbox Code Playgroud)

回复

{
    "took": 43,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": 11,
        "max_score": 0,
        "hits": []
    },
    "aggregations": {
        "facets": {
            "doc_count": 4,
            "categories": {
                "doc_count_error_upper_bound": 0,
                "sum_other_doc_count": 0,
                "buckets": [
                    {
                        "key": "auto, tools & travel ",
                        "doc_count": 4
                    },
                    {
                        "key": "auto, tools & travel > luggage tags ",
                        "doc_count": 4
                    },
                    {
                        "key": "auto, tools & travel > luggage tags > luggage spotters",
                        "doc_count": 3
                    },
                    {
                        "key": "auto, tools & travel > luggage tags > something else",
                        "doc_count": 1
                    }
                ]
            }
        }
    }
}
Run Code Online (Sandbox Code Playgroud)

最终答案后聊天讨论

POST myindex/_search
{
  "size": 0,
  "aggs":{
    "facets": {
      "filter": { 
          "bool": {
            "must": [
              { "match": { "categories": "luggage"} }
          ]
        }
      },
      "aggs": {
        "categories": {
          "terms": {
            "field": "categories.facet",
            "exclude": ".*>{1}.*>{1}.*"
          }
        }
      }
    }
  }
}
Run Code Online (Sandbox Code Playgroud)

请注意,我添加了excludea regular expression,这样它就不会考虑任何出现多次的方面>

如果有帮助的话请告诉我。

  • @MiDaa 是的,这是正确的。它仅适用于“文本”字段,因为“关键字”不使用“分析器”。除非您有该字段的大量数据,否则您不必担心性能部分。据说在“排序”和“聚合”查询时使用“keyword”而不是使用“text”和“fielddata: true”,但它不适用于这种情况,在其他方面这是唯一的我们有选择。希望能澄清。 (2认同)