Elasticsearch Completion Suggester 不会返回与输入匹配的搜索文档

whi*_*s11 5 elasticsearch completion

我在使用 Elasticsearch 6.0 时遇到了一个奇怪的问题。

\n\n

我有一个具有以下映射的索引:

\n\n
{\n  "cities": {\n    "mappings": {\n      "cities": {\n        "properties": {\n          "city": {\n            "properties": {\n              "id": {\n                "type": "long"\n              },\n              "name": {\n                "properties": {\n                  "en": {\n                    "type": "text",\n                    "fields": {\n                      "keyword": {\n                        "type": "keyword",\n                        "ignore_above": 256\n                      }\n                    }\n                  },\n                  "it": {\n                    "type": "text",\n                    "fields": {\n                      "keyword": {\n                        "type": "keyword",\n                        "ignore_above": 256\n                      }\n                    }\n                  }\n                }\n              },\n              "slug": {\n                "properties": {\n                  "en": {\n                    "type": "text",\n                    "fields": {\n                      "keyword": {\n                        "type": "keyword",\n                        "ignore_above": 256\n                      }\n                    }\n                  },\n                  "it": {\n                    "type": "text",\n                    "fields": {\n                      "keyword": {\n                        "type": "keyword",\n                        "ignore_above": 256\n                      }\n                    }\n                  }\n                }\n              }\n            }\n          },\n          "doctype": {\n            "type": "text",\n            "fields": {\n              "keyword": {\n                "type": "keyword",\n                "ignore_above": 256\n              }\n            }\n          },\n          "suggest": {\n            "type": "completion",\n            "analyzer": "accents",\n            "search_analyzer": "simple",\n            "preserve_separators": true,\n            "preserve_position_increments": false,\n            "max_input_length": 50\n          },\n          "weight": {\n            "type": "long"\n          }\n        }\n      }\n    }\n  }\n}\n
Run Code Online (Sandbox Code Playgroud)\n\n

我的索引中有这些文档:

\n\n
{\n  "_index": "cities",\n  "_type": "cities",\n  "_id": "991-city",\n  "_version": 128,\n  "found": true,\n  "_source": {\n    "doctype": "city",\n    "suggest": {\n      "input": [\n        "nazar\xc3\xa9",\n        "nazare",\n        "\xeb\x82\x98\xec\x9e\x90\xeb\xa0\x88",\n        "najare",\n        "najale",\n        "\xe3\x83\x8a\xe3\x82\xb6\xe3\x83\xac",\n        "\xce\x9d\xce\xb1\xce\xb6\xce\xb1\xcf\x81\xce\xad"\n      ],\n      "weight": 1807\n    },\n    "weight": 3012,\n    "city": {\n      "id": 991,\n      "name": {\n        "en": "Nazar\xc3\xa9",\n        "it": "Nazar\xc3\xa9"\n      },\n      "slug": {\n        "en": "nazare",\n        "it": "nazare"\n      }\n    }\n  }\n}\n\n{\n  "_index": "cities",\n  "_type": "cities",\n  "_id": "1085-city",\n  "_version": 128,\n  "found": true,\n  "_source": {\n    "doctype": "city",\n    "suggest": {\n      "input": [\n        "nazareth",\n        "nazaret",\n        "\xe6\x8b\xbf\xe6\x92\x92\xe5\x8b\x92",\n        "na sa le",\n        "sa le",\n        "le",\n        "na-sa-lei",\n        "\xeb\x82\x98\xec\x82\xac\xeb\xa0\x9b",\n        "nasares",\n        "nasales",\n        "\xe3\x83\x8a\xe3\x82\xb6\xe3\x83\xac\xe3\x82\xb9",\n        "nazaresu",\n        "\xe0\xa4\xa8\xe0\xa4\x9c\xe0\xa4\xbc\xe0\xa4\xbe\xe0\xa4\xb0\xe0\xa5\x87\xe0\xa4\xa5",\n        "nj\'aareth",\n        "aareth",\n        "najaratha",\n        "\xd0\x9d\xd0\xb0\xd0\xb7\xd0\xb0\xd1\x80\xd0\xb5\xd1\x82",\n        "\xce\x9d\xce\xb1\xce\xb6\xce\xb1\xcf\x81\xce\xad\xcf\x84",\n        "n\xc3\xa1z\xc3\xa1ret",\n        "nazaretas"\n      ],\n      "weight": 1809\n    },\n    "weight": 3015,\n    "city": {\n      "id": 1085,\n      "name": {\n        "en": "Nazareth",\n        "it": "Nazareth"\n      },\n      "slug": {\n        "en": "nazareth",\n        "it": "nazareth"\n      }\n    }\n  }\n}\n
Run Code Online (Sandbox Code Playgroud)\n\n

现在,当我使用建议器进行搜索时,使用以下查询:

\n\n
POST /cities/_search\n{\n  "suggest":{\n    "suggest":{\n      "prefix":"nazare",\n      "completion":{\n        "field":"suggest"\n      }\n    }\n  }\n}\n
Run Code Online (Sandbox Code Playgroud)\n\n

我希望在我的结果中包含这两个文件,但我只得到第二个文件(拿撒勒):

\n\n
{\n  "took": 0,\n  "timed_out": false,\n  "_shards": {\n    "total": 5,\n    "successful": 5,\n    "skipped": 0,\n    "failed": 0\n  },\n  "hits": {\n    "total": 0,\n    "max_score": 0.0,\n    "hits": []\n  },\n  "suggest": {\n    "suggest": [\n      {\n        "text": "nazare",\n        "offset": 0,\n        "length": 6,\n        "options": [\n          {\n            "text": "nazaresu",\n            "_index": "cities",\n            "_type": "cities",\n            "_id": "1085-city",\n            "_score": 1809.0,\n            "_source": {\n              "doctype": "city",\n              "suggest": {\n                "input": [\n                  "nazareth",\n                  "nazaret",\n                  "\xe6\x8b\xbf\xe6\x92\x92\xe5\x8b\x92",\n                  "na sa le",\n                  "sa le",\n                  "le",\n                  "na-sa-lei",\n                  "\xeb\x82\x98\xec\x82\xac\xeb\xa0\x9b",\n                  "nasares",\n                  "nasales",\n                  "\xe3\x83\x8a\xe3\x82\xb6\xe3\x83\xac\xe3\x82\xb9",\n                  "nazaresu",\n                  "\xe0\xa4\xa8\xe0\xa4\x9c\xe0\xa4\xbc\xe0\xa4\xbe\xe0\xa4\xb0\xe0\xa5\x87\xe0\xa4\xa5",\n                  "nj\'aareth",\n                  "aareth",\n                  "najaratha",\n                  "\xd0\x9d\xd0\xb0\xd0\xb7\xd0\xb0\xd1\x80\xd0\xb5\xd1\x82",\n                  "\xce\x9d\xce\xb1\xce\xb6\xce\xb1\xcf\x81\xce\xad\xcf\x84",\n                  "n\xc3\xa1z\xc3\xa1ret",\n                  "nazaretas"\n                ],\n                "weight": 1809\n              },\n              "weight": 3015,\n              "city": {\n                "id": 1085,\n                "name": {\n                  "en": "Nazareth",\n                  "it": "Nazareth"\n                },\n                "slug": {\n                  "en": "nazareth",\n                  "it": "nazareth"\n                }\n              }\n            }\n          }\n        ]\n      }\n    ]\n  }\n}\n
Run Code Online (Sandbox Code Playgroud)\n\n

这是出乎意料的,因为在第一个文档的建议输入中,我搜索的术语“nazare”与我输入的完全一样。

\n\n

另一个有趣的事实是,如果我搜索“najare”而不是“nazare”,我会得到正确的结果。

\n\n

任何提示将非常感激!

\n

Pie*_*gel 3

要获得快速解决方案,请使用查询对象size中的参数。completion

GET /cities/_search
{
  "suggest":{
    "suggest":{
      "prefix":"nazare",
      "completion":{
        "field":"suggest",
        "size": 100             <- HERE
      }
    }
  }
}
Run Code Online (Sandbox Code Playgroud)

size 参数默认为 5,因此一旦 Elasticsearch 发现5 个具有正确前缀的术语(而不是文档),它将停止寻找更多术语(以及因此的文档)。

此限制是按术​​语计算的,而不是按文档计算的。因此,如果一份文档包含 5 个正确的术语,并且您使用默认值 5,那么其他文档可能不会被返回。

我坚信这就是你的情况。返回的文档至少有 5 个具有前缀的建议术语nazare,因此仅返回这一个。

有趣的是,当您搜索时najare,只有一个术语具有正确的前缀,因此您会得到正确的结果。

棘手的是,结果取决于 elasticsearch 检索文档的顺序。如果首先检索第一个文档,则它不会达到阈size值(仅出现 2 或 3 个前缀),也会检索下一个文档,并且您将获得正确的结果。

另外,除非必要,否则请避免使用非常高的参数值(例如 > 1000)size。它可能会影响性能,特别是对于短前缀或公共前缀。