Fak*_*ame 5 postgresql performance greatest-n-per-group postgresql-performance
是的,每组最多的问题。
给定一个releases
包含以下列的表:
id | primary key |
volume | double precision |
chapter | double precision |
series | integer-foreign-key |
include | boolean | not null
Run Code Online (Sandbox Code Playgroud)
我想选择音量的复合最大值,然后是一组系列的章节。
现在,如果我查询 per-distinct-series,我可以按如下方式轻松完成此操作:
SELECT
releases.chapter AS releases_chapter,
releases.include AS releases_include,
releases.series AS releases_series
FROM releases
WHERE releases.series = 741
AND releases.include = TRUE
ORDER BY releases.volume DESC NULLS LAST, releases.chapter DESC NULLS LAST LIMIT 1;
Run Code Online (Sandbox Code Playgroud)
但是,如果我有大量series
(我确实有),这很快就会遇到效率问题,我要发出 100 多个查询来生成单个页面。
我喜欢滚整个事情到一个查询,在那里我可以简单地说WHERE releases.series IN (1,2,3....)
,但我还没有想出如何说服Postgres的,让我这样做。
天真的方法是:
SELECT releases.volume AS releases_volume,
releases.chapter AS releases_chapter,
releases.series AS releases_series
FROM
releases
WHERE
releases.series IN (12, 17, 44, 79, 88, 110, 129, 133, 142, 160, 193, 231, 235, 295, 340, 484, 499,
556, 581, 664, 666, 701, 741, 780, 790, 796, 874, 930, 1066, 1091, 1135, 1137,
1172, 1331, 1374, 1418, 1435, 1447, 1471, 1505, 1521, 1540, 1616, 1702, 1768,
1825, 1828, 1847, 1881, 2007, 2020, 2051, 2085, 2158, 2183, 2190, 2235, 2255,
2264, 2275, 2325, 2333, 2334, 2337, 2341, 2343, 2348, 2370, 2372, 2376, 2606,
2634, 2636, 2695, 2696 )
AND releases.include = TRUE
GROUP BY
releases_series
ORDER BY releases.volume DESC NULLS LAST, releases.chapter DESC NULLS LAST;
Run Code Online (Sandbox Code Playgroud)
这显然不起作用:
Run Code Online (Sandbox Code Playgroud)ERROR: column "releases.volume" must appear in the GROUP BY clause or be used in an aggregate function
如果没有GROUP BY
,它确实会获取所有内容,并且通过一些简单的过程过滤它甚至可以工作,但是在 SQL 中必须有一种“正确”的方法来做到这一点。
遵循错误,并添加聚合:
SELECT max(releases.volume) AS releases_volume,
max(releases.chapter) AS releases_chapter,
releases.series AS releases_series
FROM
releases
WHERE
releases.series IN (12, 17, 44, 79, 88, 110, 129, 133, 142, 160, 193, 231, 235, 295, 340, 484, 499,
556, 581, 664, 666, 701, 741, 780, 790, 796, 874, 930, 1066, 1091, 1135, 1137,
1172, 1331, 1374, 1418, 1435, 1447, 1471, 1505, 1521, 1540, 1616, 1702, 1768,
1825, 1828, 1847, 1881, 2007, 2020, 2051, 2085, 2158, 2183, 2190, 2235, 2255,
2264, 2275, 2325, 2333, 2334, 2337, 2341, 2343, 2348, 2370, 2372, 2376, 2606,
2634, 2636, 2695, 2696 )
AND releases.include = TRUE
GROUP BY
releases_series;
Run Code Online (Sandbox Code Playgroud)
大多数情况下有效,但问题是两个最大值不一致。如果我有两行,其中 volume:chapter 是 1:5 和 4:1,我需要返回 4:1,但独立最大值返回 4:5。
坦率地说,这在我的应用程序代码中实现起来非常简单,我必须在这里遗漏一些明显的东西。如何实现真正满足我的要求的查询?
Postgres 中的简单解决方案是DISTINCT ON
:
SELECT DISTINCT ON (r.series)
r.volume AS releases_volume
, r.chapter AS releases_chapter
, r.series AS releases_series
FROM releases r
WHERE r.series IN (
12, 17, 44, 79, 88, 110, 129, 133, 142, 160, 193, 231, 235, 295, 340, 484, 499
, 556, 581, 664, 666, 701, 741, 780, 790, 796, 874, 930, 1066, 1091, 1135, 1137
, 1172, 1331, 1374, 1418, 1435, 1447, 1471, 1505, 1521, 1540, 1616, 1702, 1768
, 1825, 1828, 1847, 1881, 2007, 2020, 2051, 2085, 2158, 2183, 2190, 2235, 2255
, 2264, 2275, 2325, 2333, 2334, 2337, 2341, 2343, 2348, 2370, 2372, 2376, 2606
, 2634, 2636, 2695, 2696)
AND r.include
ORDER BY r.series, r.volume DESC NULLS LAST, r.chapter DESC NULLS LAST;
Run Code Online (Sandbox Code Playgroud)
细节:
根据数据分布,可能有更快的技术:
此外,对于长列表,还有比IN ()
.
将非嵌套数组与LATERAL
连接组合起来:
SELECT r.*
FROM unnest('{12, 17, 44, 79, 88, 110, 129}'::int[]) t(i) -- or many more items
, LATERAL (
SELECT volume AS releases_volume
, chapter AS releases_chapter
, series AS releases_series
FROM releases
WHERE series = t.i
AND include
ORDER BY series, volume DESC NULLS LAST, chapter DESC NULLS LAST
LIMIT 1
) r;
Run Code Online (Sandbox Code Playgroud)
往往更快。为了获得最佳性能,您需要一个匹配的多列索引,例如:
CREATE INDEX releases_series_volume_chapter_idx
ON releases(series, volume DESC NULLS LAST, chapter DESC NULLS LAST);
Run Code Online (Sandbox Code Playgroud)
有关的:
如果有不止几行include
不是true
,而您只对带有 的行感兴趣include = true
,那么请考虑部分多列索引:
CREATE INDEX releases_series_volume_chapter_idx
ON releases(series, volume DESC NULLS LAST, chapter DESC NULLS LAST)
WHERE include;
Run Code Online (Sandbox Code Playgroud)
归档时间: |
|
查看次数: |
84 次 |
最近记录: |