xen*_*ide 6 postgresql full-text-search
这是表定义
Table "public.kb_article_contents"
Column | Type | Modifiers
------------+----------+-----------
article_id | smallint | not null
contents | text | not null
keywords | text | not null
Indexes:
"contents_idx" gin (to_tsvector('english'::regconfig, contents))
Foreign-key constraints:
"kb_article_contents_article_id_fkey" FOREIGN KEY (article_id) REFERENCES kb_articles(id)
Run Code Online (Sandbox Code Playgroud)
和查询
support=> EXPLAIN ANALYSE
SELECT
id,
-- if we remove the next line runtimes speeds up (and of course the order by)
ts_rank(to_tsvector( 'english', contents ), plainto_tsquery('string')) AS rank
FROM
kb_article_contents
INNER JOIN kb_articles
ON ( kb_article_contents.article_id = kb_articles.id )
WHERE
published = 'true'
AND
to_tsvector( 'english', contents ) @@ plainto_tsquery('string')
ORDER BY rank DESC
LIMIT 25
;
Run Code Online (Sandbox Code Playgroud)
这是慢查询的查询计划
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=57.90..57.92 rows=5 width=830) (actual time=452.003..452.028 rows=25 loops=1)
-> Sort (cost=57.90..57.92 rows=5 width=830) (actual time=452.001..452.010 rows=25 loops=1)
Sort Key: (ts_rank(to_tsvector('english'::regconfig, kb_article_contents.contents), plainto_tsquery('string'::text)))
Sort Method: top-N heapsort Memory: 17kB
-> Hash Join (cost=21.36..57.85 rows=5 width=830) (actual time=17.688..451.334 rows=299 loops=1)
Hash Cond: (kb_articles.id = kb_article_contents.article_id)
-> Seq Scan on kb_articles (cost=0.00..32.06 rows=1156 width=4) (actual time=0.008..1.059 rows=1156 loops=1)
Filter: published
-> Hash (cost=21.30..21.30 rows=5 width=828) (actual time=1.175..1.175 rows=302 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 279kB
-> Bitmap Heap Scan on kb_article_contents (cost=4.30..21.30 rows=5 width=828) (actual time=0.318..0.700 rows=302 loops=1)
Recheck Cond: (to_tsvector('english'::regconfig, contents) @@ plainto_tsquery('string'::text))
-> Bitmap Index Scan on contents_idx (cost=0.00..4.30 rows=5 width=0) (actual time=0.284..0.284 rows=302 loops=1)
Index Cond: (to_tsvector('english'::regconfig, contents) @@ plainto_tsquery('string'::text))
Total runtime: 452.109 ms
(15 rows)
Run Code Online (Sandbox Code Playgroud)
这是没有的查询计划 ts_rank
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=21.36..57.81 rows=5 width=4) (actual time=0.812..0.938 rows=25 loops=1)
-> Hash Join (cost=21.36..57.81 rows=5 width=4) (actual time=0.810..0.920 rows=25 loops=1)
Hash Cond: (kb_articles.id = kb_article_contents.article_id)
-> Seq Scan on kb_articles (cost=0.00..32.06 rows=1156 width=4) (actual time=0.009..0.064 rows=89 loops=1)
Filter: published
-> Hash (cost=21.30..21.30 rows=5 width=2) (actual time=0.782..0.782 rows=302 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 7kB
-> Bitmap Heap Scan on kb_article_contents (cost=4.30..21.30 rows=5 width=2) (actual time=0.203..0.589 rows=302 loops=1)
Recheck Cond: (to_tsvector('english'::regconfig, contents) @@ plainto_tsquery('string'::text))
-> Bitmap Index Scan on contents_idx (cost=0.00..4.30 rows=5 width=0) (actual time=0.171..0.171 rows=302 loops=1)
Index Cond: (to_tsvector('english'::regconfig, contents) @@ plainto_tsquery('string'::text))
Total runtime: 1.002 ms
(12 rows)
Run Code Online (Sandbox Code Playgroud)
我不确定为什么添加ts_rank到查询会导致它像这样及时膨胀。我可以做些什么来优化查询?请注意,删除ORDER BY不会提高查询速度,因此似乎并非如此。
您有三个函数调用,它们可能相当占用 CPU 资源。 to_tsvector( 'english', contents )需要为每一行运行,并且可能是您花费时间的地方。 plainto_tsquery('string')每个查询应该只运行一次,因此成本不会太高。 ts_rank无论您如何处理数据,也都需要为每一行运行。
您应该创建一个文本搜索索引,而不是为每个查询生成 tsvector。有关详细信息,请参阅PostgreSQL 文档。
| 归档时间: |
|
| 查看次数: |
2868 次 |
| 最近记录: |