使用 elasticsearch 搜索特殊字符

Question

使用 elasticsearch 搜索特殊字符

我只是对elasticsearch有问题，我有一些需要使用特殊字符进行搜索的业务需求。例如，某些查询字符串可能包含 (space, @, &, ^, (), !) 我在下面有一些类似的用例。

foo&bar123（完全匹配）
foo & bar123（单词之间的空格）
foobar123（无特殊字符）
foobar 123（没有带空格的特殊字符）
foo bar 123（单词之间没有带有空格的特殊字符）
FOO&BAR123 (大写)

所有这些都应该匹配相同的结果，任何人都可以给我一些帮助吗？请注意，我现在可以完美地搜索没有特殊字符的其他字符串

{
    "settings": {
        "number_of_shards": 1, 
        "analysis": {
            "analyzer": {
                "autocomplete": {
                    "tokenizer": "custom_tokenizer"
                }
            },
            "tokenizer": {
                "custom_tokenizer": {
                  "type": "ngram",
                  "min_gram": 2,
                  "max_gram": 30,
                  "token_chars": [
                    "letter",
                    "digit"
                  ]
                }
          }
        }
    },
        "mappings": {
            "index": {
                "properties": {
                    "some_field": {
                        "type": "text",
                        "analyzer": "autocomplete"
                    },
                    "some_field_2": {
                        "type": "text",
                        "analyzer": "autocomplete"
                    }
                }
           }
    }
}

Run Code Online (Sandbox Code Playgroud)

Answer 1

ifo*_*o20 6

编辑：

这里有两件事要检查：

(1) 我们索引文档的时候分析的特殊字符是不是？

_analyze API 告诉我们没有：

POST localhost:9200/index-name/_analyze
{
    "analyzer": "autocomplete",
    "text": "foo&bar"
}

// returns
fo, foo, foob, fooba, foobar, oo, oob, // ...etc: the & has been ignored

Run Code Online (Sandbox Code Playgroud)

这是因为映射中的“token_chars”：“字母”、“数字”。这两组不包括标点符号，例如“&”。因此，当您将“foo&bar”上传到索引时，实际上会忽略 &。

要在索引中包含 &，您需要在“token_chars”列表中包含“标点符号”。对于其他一些字符，您可能还需要“符号”组...：

"tokenizer": {
    "custom_tokenizer": {
        "type": "ngram",
            "min_gram": 2,
            "max_gram": 30,
            "token_chars": [
                "letter",
                "digit",
                "symbol",
                "punctuation"
              ]
     }
}

Run Code Online (Sandbox Code Playgroud)

现在我们看到这些术语被适当地分析：

POST localhost:9200/index-name/_analyze
{
    "analyzer": "autocomplete",
    "text": "foo&bar"
}

// returns
fo, foo, foo&, foo&b, foo&ba, foo&bar, oo, oo&, // ...etc

Run Code Online (Sandbox Code Playgroud)

(2) 我的搜索查询是否符合我的预期？

现在我们知道 'foo&bar' 文档正在被正确索引（分析），我们需要检查搜索是否返回结果。以下查询有效：

POST localhost:9200/index-name/_doc/_search
{
    "query": {
        "match": { "some_field": "foo&bar" }
    }
}

Run Code Online (Sandbox Code Playgroud)

与 GET 查询一样 http://localhost:9200/index-name/_search?q=foo%26bar

其他查询的结果可能出乎意料 - 根据文档，您可能希望声明您的 search_analyzer 与您的索引分析器（例如 ngram 索引分析器和标准搜索分析器）不同……但这取决于您

在我的“token_chars”中添加“标点符号”后，一切正常！！谢谢！！ (3认同)

归档时间：	7 年，6 月前
查看次数：	10793 次
最近记录：	7 年，6 月前