尽管在索引和搜索上都使用小写过滤器，为什么我的弹性搜索前缀查询区分大小写？

Question

尽管在索引和搜索上都使用小写过滤器，为什么我的弹性搜索前缀查询区分大小写？

ifo*_*o20 5 elasticsearch

问题

我正在使用 ElasticSearch 6.2.3 开发自动完成器。我希望使用以下优先级对查询结果（带有“名称”字段的页面列表）进行排序：

“名称”开头的前缀匹配（前缀查询）
“名称”（术语查询）中的任何其他完全（整个单词）匹配
模糊匹配（目前这是使用 ngram 分词器在与 Name 不同的字段上完成的......所以我认为与我的问题无关，但我也想将其应用于 Name 字段）

我尝试的解决方案

我将使用由三个查询组成的 Bool/Should 查询（对应于上面的三个优先级），并使用 boost 来定义相对重要性。

我遇到的问题是前缀查询 - 尽管我的搜索分析器具有小写过滤器，但它似乎没有小写搜索查询。例如，以下查询为“harry”返回“Harry Potter”，但为“Harry”返回零结果：

{ "query": { "prefix": { "Name.raw" : "Harry" } } }

Run Code Online (Sandbox Code Playgroud)

我已经使用_analyzeAPI 验证了我的两个分析器确实将文本“Harry”小写为“harry”。我哪里错了？

从 ES 文档中，我了解到我需要以两种不同的方式分析名称字段，以启用前缀和术语查询：

使用“关键字”标记器来启用前缀查询（我已将其应用于字段.raw）
使用标准分析器启用术语（我已将其应用于“名称”字段）

我已经检查了诸如此之类的重复问题，但答案没有帮助

我的映射和设置如下

ES索引映射

{
    "myIndex": {
        "mappings": {
            "pages": {
                "properties": {
                    "Id": {},
                    "Name": {
                        "type": "text",
                        "fields": {
                            "raw": {
                                "type": "text",
                                "analyzer": "keywordAnalyzer",
                                "search_analyzer": "pageSearchAnalyzer"
                            }
                        },
                    "analyzer": "pageSearchAnalyzer"
                    },
                    "Tokens": {}, // Other fields not important for this question
                }
            }
        }
    }
}

Run Code Online (Sandbox Code Playgroud)

ES索引设置

{
    "myIndex": {
        "settings": {
            "index": {
                "analysis": {
                    "filter": {
                        "ngram": {
                            "type": "edgeNGram",
                            "min_gram": "2",
                            "max_gram": "15"
                        }
                    },
                    "analyzer": {
                        "keywordAnalyzer": {
                            "filter": [
                                "trim",
                                "lowercase",
                                "asciifolding"
                            ],
                            "type": "custom",
                            "tokenizer": "keyword"
                        },
                        "pageSearchAnalyzer": {
                            "filter": [
                                "trim",
                                "lowercase",
                                "asciifolding"
                            ],
                            "type": "custom",
                            "tokenizer": "standard"
                        },
                        "pageIndexAnalyzer": {
                            "filter": [
                                "trim",
                                "lowercase",
                                "asciifolding",
                                "ngram"
                                ],
                            "type": "custom",
                            "tokenizer": "standard"
                        }
                    }
                },
                "number_of_replicas": "1",
                "uuid": "l2AXoENGRqafm42OSWWTAg",
                "version": {}
            }
        }
    }
}

Run Code Online (Sandbox Code Playgroud)

Answer 1

rus*_*der 3

前缀查询不会分析搜索词，因此您传递给它的文本会绕过用作搜索分析器的任何内容（在您的情况下是配置的search_analyzer: pageSearchAnalyzer），并Harry直接根据关键字标记化、自定义过滤的harry potter内容进行按原样评估是keywordAnalyzer在索引时间应用的结果。

就您的情况而言，您需要执行以下几项不同的操作之一：

lowercase由于您在字段上使用过滤器，因此您可以始终在前缀查询中使用小写术语（如有必要，请使用应用程序端小写）
match针对-analyzed 字段运行查询edge_ngram，而不是像ES search_analyzer 文档prefix中所述的查询

这是后者的一个例子：

1）使用ngram分析器和（推荐）标准搜索分析器创建索引

PUT my_index
{
  "settings": {
    "index": {
      "analysis": {
        "filter": {
          "ngram": {
            "type": "edgeNGram",
            "min_gram": "2",
            "max_gram": "15"
          }
        },
        "analyzer": {
          "pageIndexAnalyzer": {
            "filter": [
              "trim",
              "lowercase",
              "asciifolding",
              "ngram"
            ],
            "type": "custom",
            "tokenizer": "keyword"
          }
        }
      }
    }
  },
  "mappings": {
    "pages": {
      "properties": {
        "name": {
          "type": "text",
          "fields": {
            "ngram": {
              "type": "text",
              "analyzer": "pageIndexAnalyzer",
              "search_analyzer": "standard"
            }
          }
        }
      }
    }
  }
}

Run Code Online (Sandbox Code Playgroud)

2）索引一些示例文档

POST my_index/pages/_bulk
{"index":{}}
{"name":"Harry Potter"}
{"index":{}}
{"name":"Hermione Granger"}

Run Code Online (Sandbox Code Playgroud)

3) 针对 ngram 字段运行匹配查询

POST my_index/pages/_search
{
  "query": {
    "match": {
      "query": "Har",
      "operator": "and"
    }
  }
}

Run Code Online (Sandbox Code Playgroud)

归档时间：	7 年，9 月前
查看次数：	5231 次
最近记录：	5 年，6 月前