Django 全文 SearchVectorField 在 PostgreSQL 中已过时

Question

Django 全文 SearchVectorField 在 PostgreSQL 中已过时

ZG1*_*101 10 django postgresql full-text-search

我在PostgreSQL 中使用 Django 的内置全文搜索。

Django 文档说性能可以通过使用SearchVectorField. 该字段ts_vector在模型旁边保留了一个包含所有相关词素的预生成列，而不是在每次搜索期间即时生成它。

但是，使用这种方法，ts_vector无论何时更新模型，都必须更新。为了保持同步，Django 文档建议使用“触发器”，请参阅 PostgreSQL 文档以获取更多详细信息。

但是，PostgreSQL 文档本身说触发器方法现在已经过时了。与其手动更新ts_vector列，不如通过使用存储的生成列来自动更新列。

如何在 Django 中使用 PostgreSQL 推荐的方法？

Answer 1

ZG1*_*101 7

我弄清楚了如何使用自定义迁移来做到这一点。主要的警告是，每当基本模型（您正在搜索）发生更改时，您都需要手动更新这些迁移。

请注意，您必须使用 PostgreSQL 12 才能执行以下操作：

首先，创建一个数据库列来存储 tsvector：

$ python manage.py makemigrations my_app --empty

Migrations for 'my_app':
  my_app/migrations/005_auto_20200625_1933.py

Run Code Online (Sandbox Code Playgroud)

打开新的迁移文件进行编辑。我们需要创建一个列来存储 tsvector，而模型定义中没有任何关联字段，这样 Django 就不会尝试更新自动生成的字段本身。

这种方法的主要缺点是，因为它没有同步到 Django 模型，所以如果字段发生更改，则需要手动创建新的迁移。

#my_app/migrations/0010_add_tsvector.py

"""
Use setweight() to rank results by weight (where 'A' is highest).
Use PostgreSQL tsvector concatenation operator || to combine multiple
fields from the table. Use `coalesce` ensure that NULL is not
returned if a field is empty.

In this case, `blog_table` is the database table name, and
`textsearch` is the new column, but you can choose anything here
"""

operations = [
    migrations.RunSQL(sql="""
        ALTER TABLE "blog_content" ADD COLUMN "textsearch" tsvector
        GENERATED ALWAYS AS (
        setweight(to_tsvector('english', coalesce(body, '')), 'A') ||
        setweight(to_tsvector('english', coalesce(title, '')), 'B') ||
        ' '
        ) STORED NULL;
    """, reverse_sql="""
        ALTER TABLE "blog_content" DROP COLUMN "textsearch";
    """
    )
]

Run Code Online (Sandbox Code Playgroud)

要在数据库中创建新列，请运行：

$ python manage.py migrate my_app

Run Code Online (Sandbox Code Playgroud)

然后，要在文本搜索中使用该列：

#my_app/views.py

from django.db.models.expressions import RawSQL
from django.contrib.postgres.search import SearchVectorField
from django.views.generic.list import ListView


class TextSearchView(ListView):
    def get_queryset(self):
        '''Return list of top results
        
        Since there is no model field, we must manually retrieve the
        column, using `annotate`
        '''
        query = self.request.GET.get('search_term')

        return Blog.objects.annotate(
                ts=RawSQL(
                    'textsearch',
                    params=[],
                    output_field=SearchVectorField()
                )
            ).filter(
                ts=query
            )

Run Code Online (Sandbox Code Playgroud)

请注意，结果已经被排序，因为每次 tsvector 更新自身时都会应用权重。

归档时间：	5 年，8 月前
查看次数：	1061 次
最近记录：	4 年，8 月前