rwe*_*ser 5 n-gram elasticsearch
我是ElasticSearch的新手,致力于跨多个字段进行部分匹配工作。例如,假设我为以下三个文档建立了索引:
{
"document-id": "Patient1",
"document-type": "patients",
"firstName": "Benjamin",
"lastName": "Carlton",
"medicalRecordNumber": "111-222-3333"
}
{
"document-id": "Patient2",
"document-type": "patients",
"firstName": "Carly",
"lastName": "Benson",
"medicalRecordNumber": "111-222-3334"
}
{
"document-id": "Patient3",
"document-type": "patients",
"firstName": "Jason",
"lastName": "Benson",
"medicalRecordNumber": "111-222-3335"
}
Run Code Online (Sandbox Code Playgroud)
我想设计一个分析器并搜索查询,以便搜索:
我觉得我已经接近了,使用以下分析器:
{
"settings": {
"analysis": {
"tokenizer": {
"partialMatchTokenizer": {
"type": "edge_ngram",
"min_gram": 2,
"max_gram": 10
}
},
"analyzer": {
"partialMatchAnalyzer": {
"type": "custom",
"tokenizer": "partialMatchTokenizer",
"char_filter": [],
"filter": [
"lowercase"
]
}
}
}
},
"mappings": {
"_doc": {
"properties": {
"lastName": {
"type": "text",
"analyzer": "partialMatchAnalyzer"
},
"firstName": {
"type": "text",
"analyzer": "partialMatchAnalyzer"
}
}
}
}
}
Run Code Online (Sandbox Code Playgroud)
和以下查询:
{
"query": {
"multi_match": {
"query": "carlt ben",
"type": "cross_fields",
"fields": [
"firstName",
"lastName",
"medicalRecordNumber"
],
"operator": "or"
}
}
}
Run Code Online (Sandbox Code Playgroud)
但这还不在那里。“或”似乎太宽容;“和”似乎过于严格。有时n-gram匹配似乎提供了意外的结果。例如,上面的查询(“ carlt ben”)匹配#1和#2(即,“ carlt”匹配“ Carly”,大概是因为“ carl” n-gram匹配)。同样,奇怪的是,“ carlt ben”和“ ben carlt”提供了两个不同的结果集(#1&#2与#1&#2&#3)。
我需要如何更改分析仪和/或查询以获得上述结果的任何想法?
归档时间: |
|
查看次数: |
134 次 |
最近记录: |