提高 PostgreSQL 中的排序性能？

Question

提高 PostgreSQL 中的排序性能？

gro*_*wse 7 postgresql performance index

我在 postgres-8.4 中有一个简单的博客数据库，它有两个表，articles和comments. 我有一个查询（由 Django 生成），想要获取类型为“NEWS”的最新文章并查找该文章的评论数量。它通过以下查询做到这一点：

SELECT "articles"."id", "articles"."datestamp", "articles"."title", "articles"."shorttitle", "articles"."description", "articles"."markdown", "articles"."body", "articles"."idxfti", "articles"."published", "articles"."type", COUNT("comments"."id") AS "comment__count"
FROM "articles"
LEFT OUTER JOIN "comments" ON ("articles"."id" = "comments"."article_id")
WHERE ("articles"."type"='NEWS')
GROUP BY "articles"."id", "articles"."datestamp", "articles"."title", "articles"."shorttitle", "articles"."description", "articles"."markdown", "articles"."body", "articles"."idxfti", "articles"."published", "articles"."type"
ORDER BY "articles"."datestamp" DESC
LIMIT 1;

Run Code Online (Sandbox Code Playgroud)

这些表都不是特别大，但该查询需要 46 毫秒。执行计划是：

Limit  (cost=119.54..119.58 rows=1 width=1150) (actual time=46.479..46.481 rows=1 loops=1)
   ->  GroupAggregate  (cost=119.54..138.88 rows=455 width=1150) (actual time=46.475..46.475 rows=1 loops=1)
     ->  Sort  (cost=119.54..120.68 rows=455 width=1150) (actual time=46.426..46.428 rows=2 loops=1)
           Sort Key: articles.datestamp, articles.id, articles.title, articles.shorttitle, articles.description, articles.markdown, articles.body, articles.idxfti, articles.published, articles.type
           Sort Method:  quicksort  Memory: 876kB
           ->  Hash Left Join  (cost=11.34..99.45 rows=455 width=1150) (actual time=0.513..2.527 rows=566 loops=1)
                 Hash Cond: (articles.id = comments.article_id)
                 ->  Seq Scan on articles  (cost=0.00..78.84 rows=455 width=1146) (actual time=0.017..0.881 rows=455 loops=1)
                       Filter: ((type)::text = 'NEWS'::text)
                 ->  Hash  (cost=8.93..8.93 rows=193 width=8) (actual time=0.486..0.486 rows=193 loops=1)
                       ->  Seq Scan on comments  (cost=0.00..8.93 rows=193 width=8) (actual time=0.004..0.252 rows=193 loops=1)
 Total runtime: 46.574 ms

Run Code Online (Sandbox Code Playgroud)

文章表定义了以下索引（其中包括）：

idx_articles_datestamp" btree (datestamp DESC) CLUSTER

Run Code Online (Sandbox Code Playgroud)

在我对它进行聚类之前，查询执行更符合估计，大约为 119 毫秒。

在我未经训练的眼睛看来，这似乎是这里花费最多时间的东西。它似乎还试图对所有 GROUP BY 字段进行排序，问题在于它试图对三个相对较大的字段body、markdown和进行排序idx_fti。

我的问题是：对于这个查询来说，这是一个不合理的时间，还是我遗漏了一些明显的东西，我可以用它来加速这个查询？该博客站点请求的所有其他查询都需要大约 1-5 毫秒的时间来执行，因此这个查询需要很长时间。我很欣赏有一个 OUTER JOIN 和一个排序，这并没有真正的帮助。但是，我不是专家，所以如果有人有任何建议，那将非常有用。

Answer 1

Erw*_*ter 9

为什么慢？

我建议将提供的查询@ypercube与上述索引结合使用。但是相比之下，为什么您的查询如此缓慢？

你没有提供你的表定义，但我从列名和你写的内容中假设你在表中有几个（大）字符类型（text或varchar）列articles：

title, shorttitle, description, markdown, body, idx_fti

Run Code Online (Sandbox Code Playgroud)

我进一步假设您使用的语言环境不是C. 根据区域设置对大型文本列进行排序相当昂贵。相关的是整理。检查您的（当前）设置LC_COLLATE：

SHOW LC_COLLATE;

Run Code Online (Sandbox Code Playgroud)

使用 Postgres 9.1 或更高版本，您可以选择一种排序规则来评估您的表达式。但是，对于 PostgreSQL 8.4，这是在集群创建时设置的，以后无法更改。

我们最近在 SO 上遇到了一个相关问题，经过深思熟虑和测试，我们发现根据区域设置进行排序是主要的放缓：

按连接表中的列进行慢速查询排序

我希望@ypercube 的查询能从根本上解决这个问题：GROUP BY长文本列的No完全消除了昂贵的排序。问题解决了。

Answer 2

ype*_*eᵀᴹ 8

使用内联子查询重写查询的另一种方法：

SELECT id,
       datestamp,
       title,
       shorttitle,
       description,
       markdown,
       body,
       idxfti,
       published,
       type,
       ( SELECT COUNT(*) 
         FROM comments 
         WHERE articles.id = comments.article_id
       ) AS comment__count
FROM articles 
WHERE type = 'NEWS'
ORDER BY datestamp DESC 
LIMIT 1

Run Code Online (Sandbox Code Playgroud)

Answer 3

小智 2

您可能想尝试删除分组依据并使用窗口函数进行计数。这消除了对所有列进行分组/排序的需要：

SELECT articles.id,
       articles.datestamp,
       articles.title,
       articles.shorttitle,
       articles.description,
       articles.markdown,
       articles.body,
       articles.idxfti,
       articles.published,
       articles.type,
       COUNT(comments.id) over () AS comment__count
FROM articles 
  LEFT OUTER JOIN comments ON (articles.id = comments.article_id)
WHERE (articles.type = 'NEWS')
ORDER BY articles.datestamp DESC 
LIMIT 1

Run Code Online (Sandbox Code Playgroud)

@growse：窗口函数是标准 SQL，并受到许多现代 DBMS（Oracle、PostgreSQL、DB2、SQL Server、Teradata、Firebird 3.0）的支持 (3认同)

归档时间：	13 年，6 月前
查看次数：	16613 次
最近记录：	7 年，10 月前