whi*_*s11 5 elasticsearch completion
我在使用 Elasticsearch 6.0 时遇到了一个奇怪的问题。
\n\n我有一个具有以下映射的索引:
\n\n{\n "cities": {\n "mappings": {\n "cities": {\n "properties": {\n "city": {\n "properties": {\n "id": {\n "type": "long"\n },\n "name": {\n "properties": {\n "en": {\n "type": "text",\n "fields": {\n "keyword": {\n "type": "keyword",\n "ignore_above": 256\n }\n }\n },\n "it": {\n "type": "text",\n "fields": {\n "keyword": {\n "type": "keyword",\n "ignore_above": 256\n }\n }\n }\n }\n },\n "slug": {\n "properties": {\n "en": {\n "type": "text",\n "fields": {\n "keyword": {\n "type": "keyword",\n "ignore_above": 256\n }\n }\n },\n "it": {\n "type": "text",\n "fields": {\n "keyword": {\n "type": "keyword",\n "ignore_above": 256\n }\n }\n }\n }\n }\n }\n },\n "doctype": {\n "type": "text",\n "fields": {\n "keyword": {\n "type": "keyword",\n "ignore_above": 256\n }\n }\n },\n "suggest": {\n "type": "completion",\n "analyzer": "accents",\n "search_analyzer": "simple",\n "preserve_separators": true,\n "preserve_position_increments": false,\n "max_input_length": 50\n },\n "weight": {\n "type": "long"\n }\n }\n }\n }\n }\n}\nRun Code Online (Sandbox Code Playgroud)\n\n我的索引中有这些文档:
\n\n{\n "_index": "cities",\n "_type": "cities",\n "_id": "991-city",\n "_version": 128,\n "found": true,\n "_source": {\n "doctype": "city",\n "suggest": {\n "input": [\n "nazar\xc3\xa9",\n "nazare",\n "\xeb\x82\x98\xec\x9e\x90\xeb\xa0\x88",\n "najare",\n "najale",\n "\xe3\x83\x8a\xe3\x82\xb6\xe3\x83\xac",\n "\xce\x9d\xce\xb1\xce\xb6\xce\xb1\xcf\x81\xce\xad"\n ],\n "weight": 1807\n },\n "weight": 3012,\n "city": {\n "id": 991,\n "name": {\n "en": "Nazar\xc3\xa9",\n "it": "Nazar\xc3\xa9"\n },\n "slug": {\n "en": "nazare",\n "it": "nazare"\n }\n }\n }\n}\n\n{\n "_index": "cities",\n "_type": "cities",\n "_id": "1085-city",\n "_version": 128,\n "found": true,\n "_source": {\n "doctype": "city",\n "suggest": {\n "input": [\n "nazareth",\n "nazaret",\n "\xe6\x8b\xbf\xe6\x92\x92\xe5\x8b\x92",\n "na sa le",\n "sa le",\n "le",\n "na-sa-lei",\n "\xeb\x82\x98\xec\x82\xac\xeb\xa0\x9b",\n "nasares",\n "nasales",\n "\xe3\x83\x8a\xe3\x82\xb6\xe3\x83\xac\xe3\x82\xb9",\n "nazaresu",\n "\xe0\xa4\xa8\xe0\xa4\x9c\xe0\xa4\xbc\xe0\xa4\xbe\xe0\xa4\xb0\xe0\xa5\x87\xe0\xa4\xa5",\n "nj\'aareth",\n "aareth",\n "najaratha",\n "\xd0\x9d\xd0\xb0\xd0\xb7\xd0\xb0\xd1\x80\xd0\xb5\xd1\x82",\n "\xce\x9d\xce\xb1\xce\xb6\xce\xb1\xcf\x81\xce\xad\xcf\x84",\n "n\xc3\xa1z\xc3\xa1ret",\n "nazaretas"\n ],\n "weight": 1809\n },\n "weight": 3015,\n "city": {\n "id": 1085,\n "name": {\n "en": "Nazareth",\n "it": "Nazareth"\n },\n "slug": {\n "en": "nazareth",\n "it": "nazareth"\n }\n }\n }\n}\nRun Code Online (Sandbox Code Playgroud)\n\n现在,当我使用建议器进行搜索时,使用以下查询:
\n\nPOST /cities/_search\n{\n "suggest":{\n "suggest":{\n "prefix":"nazare",\n "completion":{\n "field":"suggest"\n }\n }\n }\n}\nRun Code Online (Sandbox Code Playgroud)\n\n我希望在我的结果中包含这两个文件,但我只得到第二个文件(拿撒勒):
\n\n{\n "took": 0,\n "timed_out": false,\n "_shards": {\n "total": 5,\n "successful": 5,\n "skipped": 0,\n "failed": 0\n },\n "hits": {\n "total": 0,\n "max_score": 0.0,\n "hits": []\n },\n "suggest": {\n "suggest": [\n {\n "text": "nazare",\n "offset": 0,\n "length": 6,\n "options": [\n {\n "text": "nazaresu",\n "_index": "cities",\n "_type": "cities",\n "_id": "1085-city",\n "_score": 1809.0,\n "_source": {\n "doctype": "city",\n "suggest": {\n "input": [\n "nazareth",\n "nazaret",\n "\xe6\x8b\xbf\xe6\x92\x92\xe5\x8b\x92",\n "na sa le",\n "sa le",\n "le",\n "na-sa-lei",\n "\xeb\x82\x98\xec\x82\xac\xeb\xa0\x9b",\n "nasares",\n "nasales",\n "\xe3\x83\x8a\xe3\x82\xb6\xe3\x83\xac\xe3\x82\xb9",\n "nazaresu",\n "\xe0\xa4\xa8\xe0\xa4\x9c\xe0\xa4\xbc\xe0\xa4\xbe\xe0\xa4\xb0\xe0\xa5\x87\xe0\xa4\xa5",\n "nj\'aareth",\n "aareth",\n "najaratha",\n "\xd0\x9d\xd0\xb0\xd0\xb7\xd0\xb0\xd1\x80\xd0\xb5\xd1\x82",\n "\xce\x9d\xce\xb1\xce\xb6\xce\xb1\xcf\x81\xce\xad\xcf\x84",\n "n\xc3\xa1z\xc3\xa1ret",\n "nazaretas"\n ],\n "weight": 1809\n },\n "weight": 3015,\n "city": {\n "id": 1085,\n "name": {\n "en": "Nazareth",\n "it": "Nazareth"\n },\n "slug": {\n "en": "nazareth",\n "it": "nazareth"\n }\n }\n }\n }\n ]\n }\n ]\n }\n}\nRun Code Online (Sandbox Code Playgroud)\n\n这是出乎意料的,因为在第一个文档的建议输入中,我搜索的术语“nazare”与我输入的完全一样。
\n\n另一个有趣的事实是,如果我搜索“najare”而不是“nazare”,我会得到正确的结果。
\n\n任何提示将非常感激!
\n要获得快速解决方案,请使用查询对象size中的参数。completion
GET /cities/_search
{
"suggest":{
"suggest":{
"prefix":"nazare",
"completion":{
"field":"suggest",
"size": 100 <- HERE
}
}
}
}
Run Code Online (Sandbox Code Playgroud)
size 参数默认为 5,因此一旦 Elasticsearch 发现5 个具有正确前缀的术语(而不是文档),它将停止寻找更多术语(以及因此的文档)。
此限制是按术语计算的,而不是按文档计算的。因此,如果一份文档包含 5 个正确的术语,并且您使用默认值 5,那么其他文档可能不会被返回。
我坚信这就是你的情况。返回的文档至少有 5 个具有前缀的建议术语nazare,因此仅返回这一个。
有趣的是,当您搜索时najare,只有一个术语具有正确的前缀,因此您会得到正确的结果。
棘手的是,结果取决于 elasticsearch 检索文档的顺序。如果首先检索第一个文档,则它不会达到阈size值(仅出现 2 或 3 个前缀),也会检索下一个文档,并且您将获得正确的结果。
另外,除非必要,否则请避免使用非常高的参数值(例如 > 1000)size。它可能会影响性能,特别是对于短前缀或公共前缀。
| 归档时间: |
|
| 查看次数: |
634 次 |
| 最近记录: |