Iva*_*anD 6 postgresql index execution-plan
我正在尝试创建一个同时在 WHERE 和 ORDER BY 子句中使用的索引。阅读 Postgres 14 文档(11.4.索引和 ORDER BY - https://www.postgresql.org/docs/14/indexes-ordering.html)让我相信:
除了简单地查找查询要返回的行之外,索引还可以按特定的排序顺序传递它们。这允许遵守查询的 ORDER BY 规范,而无需单独的排序步骤。
哇,听起来棒极了,我们来试试吧!我创建了一个测试表,一个包含 WHERE 和 ORDER BY 列的索引,并用数据填充它:
DROP TABLE IF EXISTS testdata;
CREATE TABLE testdata
(
question_id TEXT NOT NULL UNIQUE PRIMARY KEY,
answerer_id TEXT NOT NULL,
question_date TIMESTAMPTZ NOT NULL,
answer_date TIMESTAMPTZ NOT NULL
);
DROP INDEX IF EXISTS idx1;
CREATE INDEX idx1 ON testdata (answerer_id, answer_date, question_date);
TRUNCATE testdata;
INSERT INTO testdata(question_id, answerer_id, question_date, answer_date)
SELECT CONCAT('question_', LPAD(i::TEXT, 4, '0')),
CONCAT('answerer_', LPAD(FLOOR(RANDOM() * (99 - 1 + 1) + 1)::TEXT, 2, '0')),
TIMESTAMPTZ '2021-01-01' + RANDOM() * INTERVAL '365 days',
TIMESTAMPTZ '2022-01-01' + RANDOM() * INTERVAL '365 days'
FROM GENERATE_SERIES(1, 9999) AS t(i);
VACUUM (FULL, ANALYZE) testdata;
EXPLAIN ANALYSE
SELECT *
FROM testdata
WHERE answerer_id = 'answerer_09'
ORDER BY answer_date,
question_date;
Run Code Online (Sandbox Code Playgroud)
这是数据的示例。由于answerer_id
是 1 到 99 之间的随机数,因此该查询应返回 10K 行中的约 100 行(约所有行的 10%):
EXPLAIN ANALYSE
查询的结果如下:
Sort (cost=108.49..108.75 rows=106 width=42) (actual time=2.194..3.555 rows=106 loops=1)
Sort Key: answer_date, question_date"
Sort Method: quicksort Memory: 33kB
-> Bitmap Heap Scan on testdata (cost=5.11..104.92 rows=106 width=42) (actual time=0.057..1.188 rows=106 loops=1)
Recheck Cond: (answerer_id = 'answerer_09'::text)
Heap Blocks: exact=67
-> Bitmap Index Scan on idx1 (cost=0.00..5.08 rows=106 width=0) (actual time=0.032..0.040 rows=106 loops=1)
Index Cond: (answerer_id = 'answerer_09'::text)
Planning Time: 0.154 ms
Execution Time: 4.856 ms
Run Code Online (Sandbox Code Playgroud)
那么数据库使用索引来查找满足 WHERE 子句的行,然后......用快速排序对它们进行排序?为什么不返回与索引中已排序的行完全一样的行?
我错过了什么吗?也许我需要以其他方式创建索引才能在 WHERE 和 ORDER BY 中使用它?
更新:
将查询更改为:
Sort (cost=108.49..108.75 rows=106 width=42) (actual time=2.194..3.555 rows=106 loops=1)
Sort Key: answer_date, question_date"
Sort Method: quicksort Memory: 33kB
-> Bitmap Heap Scan on testdata (cost=5.11..104.92 rows=106 width=42) (actual time=0.057..1.188 rows=106 loops=1)
Recheck Cond: (answerer_id = 'answerer_09'::text)
Heap Blocks: exact=67
-> Bitmap Index Scan on idx1 (cost=0.00..5.08 rows=106 width=0) (actual time=0.032..0.040 rows=106 loops=1)
Index Cond: (answerer_id = 'answerer_09'::text)
Planning Time: 0.154 ms
Execution Time: 4.856 ms
Run Code Online (Sandbox Code Playgroud)
彻底改变结果:
Limit (cost=0.29..83.88 rows=30 width=42) (actual time=0.064..1.599 rows=30 loops=1)
-> Index Scan using idx1 on testdata (cost=0.29..253.87 rows=91 width=42) (actual time=0.044..0.676 rows=30 loops=1)
Index Cond: (answerer_id = 'answerer_09'::text)
Planning Time: 0.125 ms
Execution Time: 1.967 ms
Run Code Online (Sandbox Code Playgroud)
如果我将限制更改为 40+,它将恢复使用排序(尽管是不同的类型:)top-N heapsort
:
Limit (cost=105.95..106.05 rows=40 width=42) (actual time=1.853..3.205 rows=40 loops=1)
-> Sort (cost=105.95..106.17 rows=91 width=42) (actual time=1.837..2.321 rows=40 loops=1)
Sort Key: answer_date, question_date"
Sort Method: top-N heapsort Memory: 30kB
-> Bitmap Heap Scan on testdata (cost=4.99..103.07 rows=91 width=42) (actual time=0.054..1.037 rows=91 loops=1)
Recheck Cond: (answerer_id = 'answerer_09'::text)
Heap Blocks: exact=57
-> Bitmap Index Scan on idx1 (cost=0.00..4.97 rows=91 width=0) (actual time=0.034..0.042 rows=91 loops=1)
Index Cond: (answerer_id = 'answerer_09'::text)
Planning Time: 0.093 ms
Execution Time: 3.618 ms
Run Code Online (Sandbox Code Playgroud)
因此索引是正确的,并且数据库知道它,但当它期望有未定义(无限制)或相当大的限制时忽略它。
这是什么原因呢?是因为某种方式排序而不使用索引更快吗?
对于大约 10% 的行,运行索引扫描通常效率不高。(这里有很多因素在起作用......)您看到的是位图索引扫描。为什么?看:
位图索引扫描无法将索引排序顺序保留到结果中。因此需要最后的排序步骤。
您可以“禁用”替代查询计划来“强制”索引扫描(仅用于测试目的!):
SET enable_bitmapscan = off;
SET enable_seqscan = off;
Run Code Online (Sandbox Code Playgroud)
或者您可以通过以下方式降低随机访问的预期成本:
SET random_page_cost = 1; -- or similar
Run Code Online (Sandbox Code Playgroud)
或者您可以LIMIT
只添加几个结果行。
其中任何一个都可以说服查询规划器切换到索引扫描,而无需额外的排序步骤:
SET enable_bitmapscan = off;
SET enable_seqscan = off;
Run Code Online (Sandbox Code Playgroud)
db<>在这里摆弄
对于只有几行和轻度选择性谓词的测试用例,很难判断顺序扫描、位图索引扫描还是索引扫描是否会更快。使用更大的表进行的测试更具启发性。
无论哪种方式,查询规划器都会严格根据估计做出决定cost
(设置SET enable_seqscan = off
只会使顺序扫描看起来非常昂贵。)预计最便宜的计划获胜。表和列统计信息、服务器配置和成本设置应尽可能有效,以获得有效的估计和良好的查询计划。