PostgreSQL 中的递归查询性能不佳

Bol*_*ger 5 postgresql performance postgresql-performance

我在 PosgreSQL 中的递归查询性能很差。

psql --version
psql (PostgreSQL) 10.7 (Ubuntu 10.7-1.pgdg16.04+1)
Run Code Online (Sandbox Code Playgroud)

这个想法是拥有树状的评论结构。所以什么时候parent_idNULL父评论,什么时候是整数,什么时候是回复。

我的评论表()的表结构如下\d+ comment

     Column      |           Type           | Collation | Nullable |               Default               | Storage  | Stats target | Description 
-----------------+--------------------------+-----------+----------+-------------------------------------+----------+--------------+-------------
 id              | bigint                   |           | not null | nextval('comment_id_seq'::regclass) | plain    |              | 
 website_page_id | bigint                   |           | not null |                                     | plain    |              | 
 author_id       | bigint                   |           | not null |                                     | plain    |              | 
 parent_id       | bigint                   |           |          |                                     | plain    |              | 
 content         | text                     |           |          |                                     | extended |              | 
 deleted_date    | timestamp with time zone |           |          |                                     | plain    |              | 
 updated_date    | timestamp with time zone |           | not null |                                     | plain    |              | 
 created_date    | timestamp with time zone |           | not null |                                     | plain    |              | 
Indexes:
    "comment_pkey" PRIMARY KEY, btree (id)
    "index_comment_id_parent_id" UNIQUE, btree (id, parent_id)
    "index_comment_website_page_id_deleted_date" btree (website_page_id, deleted_date)
    "index_comment_website_page_id_parent_id_deleted_and_created_dat" btree (website_page_id, parent_id, deleted_date, created_date DESC)
Foreign-key constraints:
    "fk_comment_author_id_id_author" FOREIGN KEY (author_id) REFERENCES author(id) ON DELETE CASCADE
    "fk_comment_parent_id_comment_id" FOREIGN KEY (parent_id) REFERENCES comment(id) ON DELETE CASCADE
    "fk_comment_website_page_id_website_page" FOREIGN KEY (website_page_id) REFERENCES website_page(id) ON DELETE CASCADE
Referenced by:
    TABLE "comment" CONSTRAINT "fk_comment_parent_id_comment_id" FOREIGN KEY (parent_id) REFERENCES comment(id) ON DELETE CASCADE
Run Code Online (Sandbox Code Playgroud)

这是EXPLAIN ANALYZE查询:

psql --version
psql (PostgreSQL) 10.7 (Ubuntu 10.7-1.pgdg16.04+1)
Run Code Online (Sandbox Code Playgroud)

这是查询计划的结果:

Limit  (cost=21024979.19..21024979.22 rows=10 width=76) (actual time=2951.338..2951.341 rows=10 loops=1)
   CTE ct
     ->  Recursive Union  (cost=0.00..20290183.86 rows=17659257 width=76) (actual time=0.031..2547.332 rows=1000010 loops=1)
           ->  Seq Scan on comment c1  (cost=0.00..25834.12 rows=999977 width=76) (actual time=0.027..175.619 rows=1000004 loops=1)
                 Filter: ((parent_id IS NULL) AND (deleted_date IS NULL) AND (website_page_id = 1))
                 Rows Removed by Filter: 6
           ->  Merge Join  (cost=1909463.84..1991116.46 rows=1665928 width=76) (actual time=499.348..499.352 rows=2 loops=4)
                 Merge Cond: (ct_1.id = comment.parent_id)
                 ->  Sort  (cost=1704437.83..1729437.26 rows=9999770 width=12) (actual time=126.066..126.067 rows=3 loops=4)
                       Sort Key: ct_1.id
                       Sort Method: quicksort  Memory: 25kB
                       ->  WorkTable Scan on ct ct_1  (cost=0.00..199995.40 rows=9999770 width=12) (actual time=0.005..32.293 rows=250002 loops=4)
                 ->  Materialize  (cost=205026.01..210026.06 rows=1000010 width=72) (actual time=370.859..370.860 rows=6 loops=4)
                       ->  Sort  (cost=205026.01..207526.04 rows=1000010 width=72) (actual time=370.856..370.857 rows=6 loops=4)
                             Sort Key: comment.parent_id
                             Sort Method: external sort  Disk: 80240kB
                             ->  Seq Scan on comment  (cost=0.00..23334.10 rows=1000010 width=72) (actual time=0.012..139.241 rows=1000010 loops=4)
   ->  Sort  (cost=734795.33..778943.48 rows=17659257 width=76) (actual time=2951.336..2951.337 rows=10 loops=1)
         Sort Key: ct.created_date DESC
         Sort Method: top-N heapsort  Memory: 26kB
         ->  CTE Scan on ct  (cost=0.00..353185.14 rows=17659257 width=76) (actual time=0.036..2854.035 rows=1000010 loops=1)
 Planning time: 1.094 ms
 Execution time: 2968.693 ms
(23 rows)
Run Code Online (Sandbox Code Playgroud)

首先,它似乎没有使用索引,并且基于explain.depesz.com,它的性能很差。

我认为这可能不是最好的表结构设计,因此没有办法使这个性能良好或者查询不是最优的。

任何建议,将不胜感激。

Eva*_*oll 3

你必须排序,因为你没有索引parent_id..

"comment_pkey" PRIMARY KEY, btree (id)
"index_comment_id_parent_id" UNIQUE, btree (id, parent_id)
Run Code Online (Sandbox Code Playgroud)

这也是多余的。当您已经在 上时,不需要UNIQUE在 on 上建立索引。(id, parent_id)UNIQUEid

解决方案:删除 上的索引id, parent_id,在 上创建索引parent_id

此外,你必须对 进行排序ct.created_date DESC。这是因为你唯一的索引是

(website_page_id, parent_id, deleted_date, created_date DESC)
Run Code Online (Sandbox Code Playgroud)

这是一个巨大的索引。它在这里也根本没有用处。

解决方案:删除这个过度复合的索引,然后创建一个ct.created_date DESC

别忘了vacuum analyze

请注意,此查询永远不会很快。1000010即使您只需要 10 行,您实际上也在处理行。考虑不要要求整个数据库的层次结构来获取 10 行。