弹性搜索查询同时使用match_phrase_prefix和模糊性？

Question

弹性搜索查询同时使用match_phrase_prefix和模糊性？

Hen*_*olm 5 fuzzy-search autocomplete elasticsearch match-phrase

我是弹性搜索的新手，因此我在努力寻找适合我们数据的最佳查询方面有些困难。

想象一下，我想匹配以下单词“ Handelsstandens Boldklub”。

当前，我正在使用以下查询：

{
    query: {
      bool: {
        should: [
          {
            match: {
              name: {
                query: query, slop: 5, type: "phrase_prefix"
              }
            }
          },
          {
            match: {
              name: {
                query: query,
                fuzziness: "AUTO",
                operator: "and"
              }
            }
          }
        ]
      }
    }
  }

Run Code Online (Sandbox Code Playgroud)

当前，如果我正在搜索“手”，它将列出该单词，但是如果我搜索“手”，则该单词将不再像打字时一样列出。但是，如果我以“ Handlesstandens”结尾，则会再次列出该列表，因为模糊不清会引起拼写错误，但仅当我键入整个单词时才如此。

是否可以同时进行短语前缀和模糊性？因此，在上述情况下，如果我在路上打错字，它还会列出单词吗？

因此，在这种情况下，如果我搜索“ Handle”，它将仍然与单词“ Handelsstandens Boldklub”匹配。

或者，还有哪些其他解决方法可以实现上述体验？我喜欢phrase_prefix匹配，因为它还支持草率匹配（因此，我可以搜索“ Boldklub han”，它将列出结果）

还是可以通过使用完成提示器来实现上述目的？

Answer 1

Hen*_*olm 5

好吧，在进一步研究 Elasticsearch 之后，我得出的结论是我应该使用 ngram。

这里很好地解释了它的作用和工作原理。 https://qbox.io/blog/an-introduction-to-ngrams-in-elasticsearch

这是我使用的设置和映射：（这是elasticsearch-rails语法）

settings analysis: {
  filter: {
    ngram_filter: {
      type: "ngram",
      min_gram: "2",
      max_gram: "20"
    }
  },
  analyzer: {
    ngram_analyzer: {
      type: "custom",
      tokenizer: "standard",
      filter: ["lowercase", "ngram_filter"]
    }
  }
} do
  mappings do
    indexes :name, type: "string", analyzer: "ngram_analyzer"
    indexes :country_id, type: "integer"
  end
end

Run Code Online (Sandbox Code Playgroud)

以及查询：（这个查询实际上是同时在两个不同的索引中进行搜索）

{
    query: {
      bool: {
        should: [
          {
            bool: {
              must: [
                { match: { "club.country_id": country.id } },
                { match: { name: query } }
              ]
            }
          },
          {
            bool: {
              must: [
                { match: { country_id: country.id } },
                { match: { name: query } }
              ]
            }
          }
        ],
        minimum_should_match: 1
      }
    }
  }

Run Code Online (Sandbox Code Playgroud)

但基本上您应该只进行匹配或多匹配查询，具体取决于您要搜索的字段数量。

我希望有人觉得它有帮助，因为我个人在模糊性而不是 ngram 方面考虑得很多（之前不知道）。这让我走向了错误的方向。

归档时间：	9 年，4 月前
查看次数：	1576 次
最近记录：	8 年，3 月前