Elasticsearch：在不添加索引映射的情况下测试自定义分析器

Question

Elasticsearch：在不添加索引映射的情况下测试自定义分析器

我可以测试自定义的 Elasticsearch 分析器/分词器而不先将其添加到索引吗？就像是：

GET _analyze
{
  "tokenizer": {
        "my_custom_tokenizer": {
          "type": "edge_ngram",
          "min_gram": 2,
          "max_gram": 10,
          "token_chars": [
          "letter", "digit", "symbol"

          ]
        }
      },
  "text" : "this is a test"
}

Run Code Online (Sandbox Code Playgroud)

我可以通过首先向索引添加一个新的分析器来测试它 -

curl -X PUT "localhost:9200/my_index" -H 'Content-Type: application/json' -d'
{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_custom_analyzer": {
          "tokenizer": "my_custom_tokenizer"
        }
      },
      "tokenizer": {
        "my_custom_tokenizer": {
          "type": "edgeNGram",
          "min_gram": 1,
          "max_gram": 30,
          "token_chars": [
          "letter", "digit", "symbol", "punctuation", "whitespace"

          ]
        }
      }
    }
  }
}
'

Run Code Online (Sandbox Code Playgroud)

然后这样做-

curl -X POST "localhost:9200/my_index/_analyze" -H 'Content-Type: application/json' -d'
{
  "analyzer": "my_custom_analyzer",
  "text": "testing"
}
'

Run Code Online (Sandbox Code Playgroud)

我可以避免这两个步骤吗？

Answer 1

Lew*_*ffa 5

据我所知，旧版本的 Elasticsearch（例如 2.x）不支持像这样的复杂数组/对象分析，但较新的版本（例如 5.x 及更高版本）肯定支持。

现有的 JSON 请求就差不多完成了，只需删除“my_custom_tokenizer”对象，同时保留其当前配置，如下所示：

{
  "tokenizer" : {
    "type": "edge_ngram", 
    "min_gram": 2, 
    "max_gram": 10, 
    "token_chars": ["letter", "digit", "symbol"]
  },

  "text" : "this is a test"
}

Run Code Online (Sandbox Code Playgroud)

归档时间：	7 年，6 月前
查看次数：	432 次
最近记录：	7 年，6 月前