使用 nGram 分析器的 Django ElasticSearch DSL 部分匹配

Mic*_*pán 3 django elasticsearch elasticsearch-dsl

django-elasticsearch-dsl 我对 ElasticSearch 主题还很陌生,我正在尝试使用 ElasticSearch 和库Github repo在我的 Django 应用程序中实现简单的电子商务搜索。

\n

我试图(极其简化)实现的目标是,考虑到这些 Django 模型实例:

\n
Red T-shirts\nBlue T-Shirts\nNice T-Shirts\n
Run Code Online (Sandbox Code Playgroud)\n

对于搜索词,T-Sh我将获得所有这三个结果:

\n
Red T-shirts\nBlue T-Shirts\nNice T-Shirts\n
Run Code Online (Sandbox Code Playgroud)\n

所以我在shop/models.py中有这个模型(同样非常简化)

\n
class Category(models.Model):\n   title = models.CharField(max_length=150, blank=False)\n   description = models.CharField(max_length=150, blank=False)\n   # In reality here I have more fields\n   def __str__(self):\n      return self.title\n
Run Code Online (Sandbox Code Playgroud)\n

shop/documents.py

\n
from elasticsearch_dsl import analyzer, tokenizer\n\nautocomplete_analyzer = analyzer(\'autocomplete_analyzer\',\n            tokenizer=tokenizer(\'trigram\', \'nGram\', min_gram=1, max_gram=20),\n            filter=[\'lowercase\']\n        )from elasticsearch_dsl import analyzer, tokenizer\n\n@registry.register_document\nclass CategoryDocument(Document):\n\n    title: fields.TextField(analyzer=autocomplete_analyzer, search_analyzer=\'standard\') # Here I\'m trying to use the analyzer specified above\n\n\n    class Index:\n        name = \'categories\'\n        settings = {\n            \'number_of_shards\': 1,\n            \'number_of_replicas\': 0,\n            \'max_ngram_diff\': 20 # This seems to be important due to the constraint for max_ngram_diff beeing 1\n        }\n\n    class Django:\n        model = Category\n        fields = [\n            \'title\', \n            # In reality here I have more fields\n        ]\n
Run Code Online (Sandbox Code Playgroud)\n

最后是我的shop/views.py

\n
class CategoryElasticSearch(ListView):\n    def get(self, request, lang):\n        search_term = request.GET.get(\'search_term\', \'\')\n        q = Q(\n            "multi_match", \n            query=search_term,\n            fields=[\n                \'title\', \n                # In reality here I have more fields\n                ], \n            fuzziness=\'auto\',)\n        search = search.query(q)\n        #\xc2\xa0... etc\n
Run Code Online (Sandbox Code Playgroud)\n

但结果为T-Sh空。只有当写更长的东西时我才会得到一些东西,例如T-Shir。现在我可能会获得所有三个结果。

\n

非常感谢

\n

Mic*_*pán 5

天哪,我成功了。

对于处理此问题的任何人 - 分析器是在映射中的每个“字段”上定义的。换句话说,为了将分析器附加到该title字段,我们的shop/documents.py必须如下所示:

from elasticsearch_dsl import analyzer, tokenizer

autocomplete_analyzer = analyzer('autocomplete_analyzer',
            tokenizer=tokenizer('trigram', 'nGram', min_gram=1, max_gram=20),
            filter=['lowercase']
        )from elasticsearch_dsl import analyzer, tokenizer

@registry.register_document
class CategoryDocument(Document):

    #title: fields.TextField(analyzer=autocomplete_analyzer, search_analyzer='standard') # Here I'm trying to use the analyzer specified above <-- This was extremely incorrect, due to the colon in definition, I don't know how I missed it but I did...
     title = fields.TextField(required=True, analyzer=autocomplete_analyzer) # This is it....

    class Index:
        name = 'categories'
        settings = {
            'number_of_shards': 1,
            'number_of_replicas': 0,
            'max_ngram_diff': 20 # This seems to be important due to the constraint for max_ngram_diff beeing 1
        }

    class Django:
        model = Category
        fields = [
            # 'title' <-- Notice, I removed this field, it would be redeclaration error
            # In reality here I have more fields
        ]
Run Code Online (Sandbox Code Playgroud)

而且它工作得完美无缺...