Vla*_*nov 5 postgresql index execution-plan window-functions greatest-n-per-group
我们使用 Amazon RDS 实例
x86_64-pc-linux-gnu 上的 PostgreSQL 11.13,由 gcc (GCC) 7.3.1 20180712 (Red Hat 7.3.1-12) 编译,64 位
我有一个简单的经典每组前 1 名查询。我需要获取每个 的历史记录中的最新项目creativeScheduleId。
这是表和索引的定义:
CREATE TABLE IF NOT EXISTS public.creative_schedule_status_histories (
id serial PRIMARY KEY,
"creativeScheduleId" text NOT NULL,
-- other columns
);
CREATE UNIQUE INDEX IF NOT EXISTS idx_creativescheduleid_id
ON public.creative_schedule_status_histories ("creativeScheduleId" ASC, id ASC);
Run Code Online (Sandbox Code Playgroud)
当引擎的查询排序时id ASC仅读取索引并且不执行任何额外的排序:
EXPLAIN (ANALYZE)
SELECT history.id, history."creativeScheduleId"
FROM (
SELECT cssh.id, cssh."creativeScheduleId"
, ROW_NUMBER() OVER (PARTITION BY cssh."creativeScheduleId"
ORDER BY cssh.id ASC) AS rn -- !
FROM creative_schedule_status_histories as cssh
) AS history
WHERE history.rn = 1;
Run Code Online (Sandbox Code Playgroud)
CREATE TABLE IF NOT EXISTS public.creative_schedule_status_histories (
id serial PRIMARY KEY,
"creativeScheduleId" text NOT NULL,
-- other columns
);
CREATE UNIQUE INDEX IF NOT EXISTS idx_creativescheduleid_id
ON public.creative_schedule_status_histories ("creativeScheduleId" ASC, id ASC);
Run Code Online (Sandbox Code Playgroud)
当我 order by 时,我希望看到完全相同的查询计划id DESC,但计划中有一个显式排序,该排序会溢出到磁盘,显然一切都变慢了。
EXPLAIN (ANALYZE)
SELECT history.id, history."creativeScheduleId"
FROM (
SELECT cssh.id, cssh."creativeScheduleId"
, ROW_NUMBER() OVER (PARTITION BY cssh."creativeScheduleId"
ORDER BY cssh.id DESC) AS rn -- !
FROM creative_schedule_status_histories as cssh
) AS history
WHERE history.rn = 1;
Run Code Online (Sandbox Code Playgroud)
EXPLAIN (ANALYZE)
SELECT history.id, history."creativeScheduleId"
FROM (
SELECT cssh.id, cssh."creativeScheduleId"
, ROW_NUMBER() OVER (PARTITION BY cssh."creativeScheduleId"
ORDER BY cssh.id ASC) AS rn -- !
FROM creative_schedule_status_histories as cssh
) AS history
WHERE history.rn = 1;
Run Code Online (Sandbox Code Playgroud)
我预计给定的索引在查询的两种变体中同样有用。
Postgres 不能向后扫描索引吗?
我在这里缺少什么?
当我对特定的给定进行查询时,Postgres 对索引和排序顺序creativeScheduleId同样使用索引。任何变体中都没有显式排序:ASCDESC
EXPLAIN (ANALYZE)
SELECT id, "creativeScheduleId"
FROM creative_schedule_status_histories AS cssh
WHERE "creativeScheduleId" = '24238370-a64c-4b30-ac8e-27eb2b693aca'
ORDER BY id DESC -- or ASC, no sort
LIMIT 1
Run Code Online (Sandbox Code Playgroud)
"Subquery Scan on history (cost=0.56..511808.63 rows=26377 width=41) (actual time=0.047..4539.058 rows=709030 loops=1)"
" Filter: (history.rn = 1)"
" Rows Removed by Filter: 4579766"
" -> WindowAgg (cost=0.56..445866.24 rows=5275391 width=49) (actual time=0.046..4165.835 rows=5288796 loops=1)"
" -> Index Only Scan using idx_creativescheduleid_id on creative_schedule_status_histories cssh (cost=0.56..353546.90 rows=5275391 width=41) (actual time=0.037..1447.490 rows=5288796 loops=1)"
" Heap Fetches: 2372"
"Planning Time: 0.072 ms"
"Execution Time: 4568.235 ms"
Run Code Online (Sandbox Code Playgroud)
这里我们实际上看到了Index Only Scan Backward,所以 Postgres 是有能力的。但不适用于整个桌子。
有什么想法如何鼓励引擎向后扫描整个索引以查找读取整个表的第一个查询?
作为解决方法,请考虑按降序索引顺序对行进行排序:
\n| ID | 创意时间表 ID |
|---|---|
| 10 | d |
| 9 | C |
| 8 | C |
| 7 | 乙 |
| 6 | 乙 |
| 5 | 乙 |
| 4 | A |
| 3 | A |
| 2 | A |
| 1 | A |
您想要的行(以粗体显示)是前一行没有“creativeScheduleId”匹配值的行:
\nEXPLAIN (ANALYZE) \nSELECT \n q1.id, \n q1."creativeScheduleId" \nFROM \n(\n SELECT\n cssh.*,\n CASE\n WHEN cssh."creativeScheduleId" = \n LAST_VALUE(cssh."creativeScheduleId") OVER (\n ORDER BY cssh."creativeScheduleId" DESC, cssh.id DESC\n ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING)\n THEN 0\n ELSE 1\n END AS qualified\n FROM public.creative_schedule_status_histories AS cssh\n) AS q1\nWHERE\n q1.qualified = 1;\nRun Code Online (Sandbox Code Playgroud)\nEXPLAIN (ANALYZE) \nSELECT \n q1.id, \n q1."creativeScheduleId" \nFROM \n(\n SELECT\n cssh.*,\n CASE\n WHEN cssh."creativeScheduleId" = \n LAST_VALUE(cssh."creativeScheduleId") OVER (\n ORDER BY cssh."creativeScheduleId" DESC, cssh.id DESC\n ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING)\n THEN 0\n ELSE 1\n END AS qualified\n FROM public.creative_schedule_status_histories AS cssh\n) AS q1\nWHERE\n q1.qualified = 1;\nRun Code Online (Sandbox Code Playgroud)\n\n在评论中,您表达了对 SQL Server 如何处理此问题的兴趣。
\n它可以使用索引向后扫描,但需要一点帮助:
\nSELECT\n Q1.id, \n Q1.creativeScheduleId\nFROM \n(\n SELECT \n CSSH.id, \n CSSH.creativeScheduleId,\n rn = ROW_NUMBER() OVER (\n PARTITION BY CSSH.creativeScheduleId\n ORDER BY CSSH.id DESC)\n FROM dbo.creative_schedule_status_histories AS CSSH\n) AS Q1\nWHERE\n Q1.rn = 1\n-- Encourage optimizer\nORDER BY\n Q1.creativeScheduleId DESC,\n Q1.id DESC;\nRun Code Online (Sandbox Code Playgroud)\nSubquery Scan on q1 (cost=0.15..104.48 rows=6 width=36) (actual time=0.014..0.014 rows=0 loops=1)\n\xe2\x80\x87\xe2\x80\x87Filter: (q1.qualified = 1)\n\xe2\x80\x87\xe2\x80\x87-> WindowAgg (cost=0.15..88.60 rows=1270 width=40) (actual time=0.013..0.014 rows=0 loops=1)\n\xe2\x80\x87\xe2\x80\x87\xe2\x80\x87\xe2\x80\x87\xe2\x80\x87\xe2\x80\x87\xe2\x80\x87\xe2\x80\x87-> Index Only Scan Backward using idx_creativescheduleid_id on creative_schedule_status_histories cssh (cost=0.15..63.20 rows=1270 width=36) (actual time=0.011..0.011 rows=0 loops=1)\n\xe2\x80\x87\xe2\x80\x87\xe2\x80\x87\xe2\x80\x87\xe2\x80\x87\xe2\x80\x87\xe2\x80\x87\xe2\x80\x87\xe2\x80\x87\xe2\x80\x87\xe2\x80\x87\xe2\x80\x87\xe2\x80\x87\xe2\x80\x87Heap Fetches: 0\nPlanning Time: 0.415 ms\nExecution Time: 0.076 ms\nRun Code Online (Sandbox Code Playgroud)\n\n\n
| 归档时间: |
|
| 查看次数: |
516 次 |
| 最近记录: |