如何优化 PostgreSQL 上大型表的最小/最大查询

Question

如何优化 PostgreSQL 上大型表的最小/最大查询

Cer*_*rin 4 postgresql performance optimization postgresql-performance

如何在 PostgreSQL 中对表进行索引，以便最小/最大查询尽快返回？

我有一个包含几亿行的大表。每行都有一个 source_id 和最后更新记录的日期。我想收集每个 source_id 的一些统计信息，特别是每个 source_id 的最小和最大日期范围。

所以我在我的表上创建了这个索引：

 CREATE INDEX CONCURRENTLY mydata_source_last_updated_date ON mydata (source_id, last_updated_date ASC);

Run Code Online (Sandbox Code Playgroud)

但是，当我尝试使用以下命令查询每个源的最短日期时：

SELECT source_id, MIN(last_updated_date) FROM mydata GROUP BY source_id;

Run Code Online (Sandbox Code Playgroud)

查询大约需要一个小时才能完成。

对于这么大的表，即使有索引，这是否是正常的性能？我怎样才能减少这个查询时间？

Answer 1

jja*_*nes 5

只需几十个不同的 source_id 值，您就可以使用松散索引扫描（也称为“跳过扫描”）快速执行您构建的索引。不幸的是，PostgreSQL 不会自动规划这些，因此您必须通过使用递归查询来强制将其规划为其中之一。

with recursive t as ( 
   select min(source_id) as col from mydata 
   union all 
   select (select min(source_id) from mydata where source_id>t.col)
      from t where t.col is not null) 
select 
  col, 
  (select min(last_updated_date) from mydata where source_id=col),
  (select max(last_updated_date) from mydata where source_id=col)
  from t;

Run Code Online (Sandbox Code Playgroud)

即使您不采用此方法，仅按照最初编写的方式执行查询也不会花费近一个小时。但如果没有看到 anexplain和 an explain analyze，就没有更多可说的了。

归档时间：	9 年，2 月前
查看次数：	4003 次
最近记录：	9 年，2 月前