Elasticsearch 在多个字段上进行部分和完全匹配

axi*_*ago 5 elasticsearch

我们的Account模型有一个first_namelast_name和一个ssn(社会安全号码)。

我想对first_name,last_name' 进行部分匹配,但对ssn. 到目前为止我有这个:

settings analysis: {
    filter: {
      substring: {
        type: "nGram",
        min_gram: 3,
        max_gram: 50
      },
      ssn_string: {
        type: "nGram",
        min_gram: 9,
        max_gram: 9
      },
    },
    analyzer: {
      index_ngram_analyzer: {
        type: "custom",
        tokenizer: "standard",
        filter: ["lowercase", "substring"]
      },
      search_ngram_analyzer: {
        type: "custom",
        tokenizer: "standard",
        filter:  ["lowercase", "substring"]
      },
      ssn_ngram_analyzer: {
        type: "custom",
        tokenizer: "standard",
        filter: ["ssn_string"]
      },
     }
   }

   mapping do
    [:first_name, :last_name].each do |attribute|
      indexes attribute, type: 'string', 
                         index_analyzer: 'index_ngram_analyzer',
                         search_analyzer: 'search_ngram_analyzer'
   end

   indexes :ssn, type: 'string', index: 'not_analyzed'

  end 
Run Code Online (Sandbox Code Playgroud)

我的搜索如下:

query: {
  multi_match: {
     fields: ["first_name", "last_name", "ssn"],
     query: query,
     type: "cross_fields",
     operator: "and"
  }
Run Code Online (Sandbox Code Playgroud)

}

所以这有效:

 Account.search("erik").records.to_a
Run Code Online (Sandbox Code Playgroud)

甚至(埃里克·史密斯):

 Account.search("erik smi").records.to_a
Run Code Online (Sandbox Code Playgroud)

和 ssn:

 Account.search("111112222").records.to_a
Run Code Online (Sandbox Code Playgroud)

但不是:

 Account.search("erik 111112222").records.to_a
Run Code Online (Sandbox Code Playgroud)

知道我索引或查询是否错误吗?

感谢您的任何帮助!

Slo*_*ens 2

是否必须使用单个查询字符串来完成?如果没有,我会做这样的事情:

PUT /test_index
{
   "settings": {
      "number_of_shards": 1,
      "analysis": {
         "filter": {
            "ngram_filter": {
               "type": "ngram",
               "min_gram": 2,
               "max_gram": 20
            }
         },
         "analyzer": {
            "ngram_analyzer": {
               "type": "custom",
               "tokenizer": "standard",
               "filter": [
                  "lowercase",
                  "ngram_filter"
               ]
            }
         }
      }
   },
   "mappings": {
      "doc": {
         "_all": {
            "enabled": true,
            "index_analyzer": "ngram_analyzer",
            "search_analyzer": "standard"
         },
         "properties": {
            "first_name": {
               "type": "string",
               "include_in_all": true
            },
            "last_name": {
               "type": "string",
               "include_in_all": true
            },
            "ssn": {
               "type": "string",
               "index": "not_analyzed",
               "include_in_all": false
            }
         }
      }
   }
}
Run Code Online (Sandbox Code Playgroud)

请注意_all 字段的使用。我将first_name和包含last_name在 中_all,但没有将 和 包含在内ssn,并且ssn根本不进行分析,因为我想对其进行精确匹配。

我索引了几个文档以供说明:

POST /test_index/doc/_bulk
{"index":{"_id":1}}
{"first_name":"Erik","last_name":"Smith","ssn":"111112222"}
{"index":{"_id":2}}
{"first_name":"Bob","last_name":"Jones","ssn":"123456789"}
Run Code Online (Sandbox Code Playgroud)

然后我可以查询部分名称,并按确切的 ssn 进行过滤:

POST /test_index/doc/_search
{
   "query": {
      "filtered": {
         "query": {
            "match": {
               "_all": {
                   "query": "eri smi",
                   "operator": "and"
               }
            }
         },
         "filter": {
            "term": {
               "ssn": "111112222"
            }
         }
      }
   }
}
Run Code Online (Sandbox Code Playgroud)

我得到了我所期待的:

{
   "took": 2,
   "timed_out": false,
   "_shards": {
      "total": 1,
      "successful": 1,
      "failed": 0
   },
   "hits": {
      "total": 1,
      "max_score": 0.8838835,
      "hits": [
         {
            "_index": "test_index",
            "_type": "doc",
            "_id": "1",
            "_score": 0.8838835,
            "_source": {
               "first_name": "Erik",
               "last_name": "Smith",
               "ssn": "111112222"
            }
         }
      ]
   }
}
Run Code Online (Sandbox Code Playgroud)

如果您需要能够使用单个查询字符串(无过滤器)进行搜索,您也可以包含ssn在该all字段中,但通过此设置,它也将匹配部分字符串(例如111112),因此这可能不是您想要的想。

如果您只想匹配前缀(即从单词开头开始的搜索词),您应该使用edge ngrams

我写了一篇关于使用 ngrams 的博客文章,可能会对您有所帮助:http://blog.qbox.io/an-introduction-to-ngrams-in-elasticsearch

这是我用于此答案的代码。我尝试了一些不同的东西,包括我在这里发布的设置,以及另一个包含ssn_all带有边缘ngrams的设置。希望这可以帮助:

http://sense.qbox.io/gist/b6a31c929945ef96779c72c468303ea3bc87320f