elasticsearch查询字符串不要按字部分搜索

Question

elasticsearch查询字符串不要按字部分搜索

我正在发送此请求

curl -XGET 'host/process_test_3/14/_search' -d '{
  "query" : {
    "query_string" : {
      "query" : "\"*cor interface*\"",
      "fields" : ["title", "obj_id"]
    }
  }
}'

Run Code Online (Sandbox Code Playgroud)

而且我得到了正确的结果

{
  "took": 12,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 3,
    "max_score": 5.421598,
    "hits": [
      {
        "_index": "process_test_3",
        "_type": "14",
        "_id": "141_dashboard_14",
        "_score": 5.421598,
        "_source": {
          "obj_type": "dashboard",
          "obj_id": "141",
          "title": "Cor Interface Monitoring"
        }
      }
    ]
  }
}

Run Code Online (Sandbox Code Playgroud)

但是当我想通过单词部分搜索时,例如

curl -XGET 'host/process_test_3/14/_search' -d '
{
  "query" : {
    "query_string" : {
      "query" : "\"*cor inter*\"",
      "fields" : ["title", "obj_id"]
    }
  }
}'

Run Code Online (Sandbox Code Playgroud)

我没有得到任何结果:

{
  "took" : 4,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 0,
    "max_score" : null,
    "hits" : []
  }
}

Run Code Online (Sandbox Code Playgroud)

我究竟做错了什么？

Answer 1

Val*_*Val 5

这是因为您的title字段可能已经过标准分析器(默认设置)进行了分析,并且标题Cor Interface Monitoring已被标记为三个标记cor,interface并且monitoring.

为了搜索单词的任何子字符串,您需要创建一个利用ngram标记过滤器的自定义分析器,以便为每个标记的所有子字符串编制索引.

您可以像这样创建索引:

curl -XPUT localhost:9200/process_test_3 -d '{
  "settings": {
    "analysis": {
      "analyzer": {
        "substring_analyzer": {
          "tokenizer": "standard",
          "filter": ["lowercase", "substring"]
        }
      },
      "filter": {
        "substring": {
          "type": "nGram",
          "min_gram": 2,
          "max_gram": 15
        }
      }
    }
  },
  "mappings": {
    "14": {
      "properties": {
        "title": {
          "type": "string",
          "analyzer": "substring_analyzer"
        }
      }
    }
  }
}'

Run Code Online (Sandbox Code Playgroud)

然后,您可以重新索引数据.这将做的是标题Cor Interface Monitoring现在将被标记为:

co,cor,or
in,int,inte,inter,interf,等
mo,mon,moni,等

让你的第二个搜索查询现在将返回文档您期望,因为令牌cor和inter现在相匹配.

归档时间：	10 年前
查看次数：	662 次
最近记录：	10 年前