令牌字符映射到Ngram过滤器ElasticSearch NEST

ASN*_*ASN 3 elasticsearch nest elasticsearch-net

我正在尝试使用NEST复制下面的映射,并在将令牌字符映射到tokenizer时遇到问题.

{
   "settings": {
      "analysis": {
         "filter": {
            "nGram_filter": {
               "type": "nGram",
               "min_gram": 2,
               "max_gram": 20,
               "token_chars": [
                  "letter",
                  "digit",
                  "punctuation",
                  "symbol"
               ]
            }
         },
         "analyzer": {
            "nGram_analyzer": {
               "type": "custom",
               "tokenizer": "whitespace",
               "filter": [
                  "lowercase",
                  "asciifolding",
                  "nGram_filter"
               ]
            }
         }
      }
   }
Run Code Online (Sandbox Code Playgroud)

我能够复制除令牌字符部分之外的所有内容.有人可以帮助这样做.下面是我的代码复制上面的映射.(令牌字符部分除外)

 var nGramFilters1 = new List<string> { "lowercase", "asciifolding", "nGram_filter" };
 var tChars = new List<string> { "letter", "digit", "punctuation", "symbol" };

    var createIndexResponse = client.CreateIndex(defaultIndex, c => c
                 .Settings(st => st
                 .Analysis(an => an
                 .Analyzers(anz => anz
                 .Custom("nGram_analyzer", cc => cc
                 .Tokenizer("whitespace").Filters(nGramFilters1)))
               .TokenFilters(tf=>tf.NGram("nGram_filter",ng=>ng.MinGram(2).MaxGram(20))))));
Run Code Online (Sandbox Code Playgroud)

参考

  1. 所以问题
  2. GitHub问题

Rus*_*Cam 6

NGram Tokenizer支持令牌字符(token_chars),使用它们来确定哪些字符应保存在令牌中,并拆分列表中未表示的任何字符.

另一方面,NGram令牌过滤器对由令牌化器生成的令牌进行操作,因此只有应该生成的最小和最大克的选项.

根据您当前的分析链,您可能需要以下内容

var createIndexResponse = client.CreateIndex(defaultIndex, c => c
    .Settings(st => st
        .Analysis(an => an
            .Analyzers(anz => anz
                .Custom("ngram_analyzer", cc => cc
                    .Tokenizer("ngram_tokenizer")
                    .Filters(nGramFilters))
                )
            .Tokenizers(tz => tz
                .NGram("ngram_tokenizer", td => td
                    .MinGram(2)
                    .MaxGram(20)
                    .TokenChars(
                        TokenChar.Letter,
                        TokenChar.Digit,
                        TokenChar.Punctuation,
                        TokenChar.Symbol
                    )
                )          
            )
        )
    )
);
Run Code Online (Sandbox Code Playgroud)


归档时间:

查看次数:

2022 次

最近记录:

7 年,3 月 前