ts_rank 大大减慢了我的查询速度,我该如何提高性能?

xen*_*ide 6 postgresql full-text-search

这是表定义

Table "public.kb_article_contents"
   Column   |   Type   | Modifiers 
------------+----------+-----------
article_id | smallint | not null
contents   | text     | not null
keywords   | text     | not null
Indexes:
   "contents_idx" gin (to_tsvector('english'::regconfig, contents))
Foreign-key constraints:
   "kb_article_contents_article_id_fkey" FOREIGN KEY (article_id) REFERENCES kb_articles(id)
Run Code Online (Sandbox Code Playgroud)

和查询

support=> EXPLAIN ANALYSE
SELECT
    id,
-- if we remove the next line runtimes speeds up (and of course the order by)
    ts_rank(to_tsvector( 'english', contents ), plainto_tsquery('string')) AS rank
FROM
    kb_article_contents
    INNER JOIN kb_articles
            ON ( kb_article_contents.article_id = kb_articles.id )

WHERE
    published = 'true'
    AND
    to_tsvector( 'english', contents ) @@ plainto_tsquery('string')
ORDER BY rank DESC
LIMIT 25
;
Run Code Online (Sandbox Code Playgroud)

这是慢查询的查询计划

                                                                    QUERY PLAN                                                                    
--------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=57.90..57.92 rows=5 width=830) (actual time=452.003..452.028 rows=25 loops=1)
   ->  Sort  (cost=57.90..57.92 rows=5 width=830) (actual time=452.001..452.010 rows=25 loops=1)
         Sort Key: (ts_rank(to_tsvector('english'::regconfig, kb_article_contents.contents), plainto_tsquery('string'::text)))
         Sort Method:  top-N heapsort  Memory: 17kB
         ->  Hash Join  (cost=21.36..57.85 rows=5 width=830) (actual time=17.688..451.334 rows=299 loops=1)
               Hash Cond: (kb_articles.id = kb_article_contents.article_id)
               ->  Seq Scan on kb_articles  (cost=0.00..32.06 rows=1156 width=4) (actual time=0.008..1.059 rows=1156 loops=1)
                 Filter: published
               ->  Hash  (cost=21.30..21.30 rows=5 width=828) (actual time=1.175..1.175 rows=302 loops=1)
                     Buckets: 1024  Batches: 1  Memory Usage: 279kB
                     ->  Bitmap Heap Scan on kb_article_contents  (cost=4.30..21.30 rows=5 width=828) (actual time=0.318..0.700 rows=302 loops=1)
                           Recheck Cond: (to_tsvector('english'::regconfig, contents) @@ plainto_tsquery('string'::text))
                           ->  Bitmap Index Scan on contents_idx  (cost=0.00..4.30 rows=5 width=0) (actual time=0.284..0.284 rows=302 loops=1)
                                 Index Cond: (to_tsvector('english'::regconfig, contents) @@ plainto_tsquery('string'::text))
Total runtime: 452.109 ms
(15 rows)
Run Code Online (Sandbox Code Playgroud)

这是没有的查询计划 ts_rank

                                                            QUERY PLAN                                                                
------------------------------------------------------------------------------------------------------------------------------------------
Limit  (cost=21.36..57.81 rows=5 width=4) (actual time=0.812..0.938 rows=25 loops=1)
  ->  Hash Join  (cost=21.36..57.81 rows=5 width=4) (actual time=0.810..0.920 rows=25 loops=1)
        Hash Cond: (kb_articles.id = kb_article_contents.article_id)
        ->  Seq Scan on kb_articles  (cost=0.00..32.06 rows=1156 width=4) (actual time=0.009..0.064 rows=89 loops=1)
           Filter: published
        ->  Hash  (cost=21.30..21.30 rows=5 width=2) (actual time=0.782..0.782 rows=302 loops=1)
              Buckets: 1024  Batches: 1  Memory Usage: 7kB
              ->  Bitmap Heap Scan on kb_article_contents  (cost=4.30..21.30 rows=5 width=2) (actual time=0.203..0.589 rows=302 loops=1)
                    Recheck Cond: (to_tsvector('english'::regconfig, contents) @@ plainto_tsquery('string'::text))
                    ->  Bitmap Index Scan on contents_idx  (cost=0.00..4.30 rows=5 width=0) (actual time=0.171..0.171 rows=302 loops=1)
                          Index Cond: (to_tsvector('english'::regconfig, contents) @@ plainto_tsquery('string'::text))
Total runtime: 1.002 ms
(12 rows)
Run Code Online (Sandbox Code Playgroud)

我不确定为什么添加ts_rank到查询会导致它像这样及时膨胀。我可以做些什么来优化查询?请注意,删除ORDER BY不会提高查询速度,因此似乎并非如此。

Bil*_*hor 2

您有三个函数调用,它们可能相当占用 CPU 资源。 to_tsvector( 'english', contents )需要为每一行运行,并且可能是您花费时间的地方。 plainto_tsquery('string')每个查询应该只运行一次,因此成本不会太高。 ts_rank无论您如何处理数据,也都需要为每一行运行。

您应该创建一个文本搜索索引,而不是为每个查询生成 tsvector。有关详细信息,请参阅PostgreSQL 文档