Bia*_*ano 3 python django postgresql full-text-search full-text-indexing
解决了我在这个问题中提出的问题后,我尝试使用索引来优化 FTS 的性能。我在我的数据库上发出了命令:
CREATE INDEX my_table_idx ON my_table USING gin(to_tsvector('italian', very_important_field), to_tsvector('italian', also_important_field), to_tsvector('italian', not_so_important_field), to_tsvector('italian', not_important_field), to_tsvector('italian', tags));
Run Code Online (Sandbox Code Playgroud)
然后我编辑了模型的 Meta 类,如下所示:
class MyEntry(models.Model):
    very_important_field = models.TextField(blank=True, null=True)
    also_important_field = models.TextField(blank=True, null=True)
    not_so_important_field = models.TextField(blank=True, null=True)
    not_important_field = models.TextField(blank=True, null=True)
    tags = models.TextField(blank=True, null=True)
    class Meta:
        managed = False
        db_table = 'my_table'
        indexes = [
            GinIndex(
                fields=['very_important_field', 'also_important_field', 'not_so_important_field', 'not_important_field', 'tags'],
                name='my_table_idx'
            )
        ]
Run Code Online (Sandbox Code Playgroud)
但似乎一切都没有改变。查找所花费的时间与以前完全相同。
这是查找脚本:
from django.contrib.postgres.search import SearchQuery, SearchRank, SearchVector
# other unrelated stuff here
vector = SearchVector("very_important_field", weight="A") + \
             SearchVector("tags", weight="A") + \
             SearchVector("also_important_field", weight="B") + \
             SearchVector("not_so_important_field", weight="C") + \
             SearchVector("not_important_field", weight="D")
query = SearchQuery(search_string, config="italian")
rank = SearchRank(vector, query, weights=[0.4, 0.6, 0.8, 1.0]). # D, C, B, A
full_text_search_qs = MyEntry.objects.annotate(rank=rank).filter(rank__gte=0.4).order_by("-rank")
Run Code Online (Sandbox Code Playgroud)
我究竟做错了什么?
上面的查找包含在我使用装饰器的函数中。该函数实际上返回一个列表,如下所示:
@timeit
def search(search_string):
    # the above code here
    qs = list(full_text_search_qs)
    return qs
Run Code Online (Sandbox Code Playgroud)
也许这就是问题所在?
您需要将 a 添加SearchVectorField到您的MyEntry,从实际文本字段更新它,然后在此字段上执行搜索。但是,只有在记录保存到数据库后才能执行更新。
本质上:
from django.contrib.postgres.indexes import GinIndex
from django.contrib.postgres.search import SearchVector, SearchVectorField
class MyEntry(models.Model):
    # The fields that contain the raw data.
    very_important_field = models.TextField(blank=True, null=True)
    also_important_field = models.TextField(blank=True, null=True)
    not_so_important_field = models.TextField(blank=True, null=True)
    not_important_field = models.TextField(blank=True, null=True)
    tags = models.TextField(blank=True, null=True)
    # The field we actually going to search.
    # Must be null=True because we cannot set it immediately during create()
    search_vector = SearchVectorField(editable=False, null=True)  
    class Meta:
        # The search index pointing to our actual search field.
        indexes = [GinIndex(fields=["search_vector"])]
Run Code Online (Sandbox Code Playgroud)
然后您可以像往常一样创建普通实例,例如:
# Does not set MyEntry.search_vector yet.
my_entry = MyEntry.objects.create(
    very_important_field="something very important",  # Fake Italien text ;-)
    also_important_field="something different but equally important"
    not_so_important_field="this one matters less"
    not_important_field="we don't care are about that one at all"
    tags="things, stuff, whatever"
Run Code Online (Sandbox Code Playgroud)
现在该条目已存在于数据库中,您可以search_vector使用各种选项更新该字段。例如weight,指定重要性并config使用默认语言配置之一。您还可以完全省略不想搜索的字段:
# Update search vector on existing database record.
my_entry.search_vector = (
    SearchVector("very_important_field", weight="A", config="italien")
    + SearchVector("also_important_field", weight="A", config="italien")
    + SearchVector("not_so_important_field", weight="C", config="italien")
    + SearchVector("tags", weight="B", config="italien")
)
my_entry.save()
Run Code Online (Sandbox Code Playgroud)
search_vector每次某些文本字段更改时手动更新字段可能很容易出错,因此您可以考虑添加 SQL 触发器来使用 Django 迁移来完成此操作。有关如何执行此操作的示例,请参阅有关使用 Django 和 PostgreSQL 进行全文搜索的博客文章。
要实际MyEntry使用索引进行搜索,您需要按search_vector字段进行过滤和排名。for configtheSearchQuery应与上述之一匹配SearchVector(使用相同的停用词、词干等)。
例如:
from django.contrib.postgres.search import SearchQuery, SearchRank
from django.core.exceptions import ValidationError
from django.db.models import F, QuerySet
search_query = SearchQuery("important", search_type="websearch", config="italien")
search_rank = SearchRank(F("search_vector"), search_query)
my_entries_found = (
    MyEntry.objects.annotate(rank=search_rank)
    .filter(search_vector=search_query)  # Perform full text search on index.
    .order_by("-rank")  # Yield most relevant entries first.
)
Run Code Online (Sandbox Code Playgroud)
        |   归档时间:  |  
           
  |  
        
|   查看次数:  |  
           2412 次  |  
        
|   最近记录:  |