Vit*_*nko 6 postgresql execution-plan union query-performance postgresql-performance
我使用 Postgres 13 并使用以下 DDL 定义了一个表:
CREATE TABLE item_codes (
code bytea NOT NULL,
item_id bytea NOT NULL,
time TIMESTAMP WITH TIME ZONE NOT NULL,
PRIMARY KEY (item_id, code)
);
CREATE INDEX ON item_codes (code, time, item_id);
Run Code Online (Sandbox Code Playgroud)
我使用以下查询:
SELECT DISTINCT time, item_id
FROM (
(SELECT time, item_id
FROM item_codes
WHERE code = '\x3965623166306238383033393437613338373162313934383034366139653239'
ORDER BY time, item_id
LIMIT 100)
UNION ALL
(SELECT time, item_id
FROM item_codes
WHERE code = '\x3836653432356638366638636338393364373935343938303233343363373561'
ORDER BY time, item_id
LIMIT 100)
) AS items
ORDER BY time, item_id
LIMIT 100;
Run Code Online (Sandbox Code Playgroud)
代码是动态生成的,子查询的数量取决于需要UNION ALL
多少个不同的值。code
它可能会变得很长。
天真地将查询重写为我认为等效的
SELECT DISTINCT time, item_id
FROM item_codes
WHERE code IN ('\x3965623166306238383033393437613338373162313934383034366139653239',
'\x3836653432356638366638636338393364373935343938303233343363373561')
ORDER BY time, item_id
LIMIT 100
Run Code Online (Sandbox Code Playgroud)
使它慢很多倍且难以接受。
两个主要问题:
是否可以以更简洁的方式重写原始查询,而无需为每个code
值重复子查询,同时仍保持快速执行计划?
为什么Postrgres不能优化第二个查询?我是否遗漏了一些东西并且它不是等价物?
带 s 的原始查询的查询计划UNION
:
Limit (cost=1.12..7.33 rows=100 width=41)
-> Merge Append (cost=1.12..13.53 rows=200 width=41)
Sort Key: btc_tx_addresses.tx_time, btc_tx_addresses.tx_id"
-> Limit (cost=0.56..4.76 rows=100 width=41)
-> Index Only Scan using btc_tx_addresses_address_tx_time_tx_id_idx on btc_tx_addresses (cost=0.56..59576.94 rows=1417576 width=41)
Index Cond: (address = '\x3965623166306238383033393437613338373162313934383034366139653239'::bytea)
-> Limit (cost=0.56..4.76 rows=100 width=41)
-> Index Only Scan using btc_tx_addresses_address_tx_time_tx_id_idx on btc_tx_addresses btc_tx_addresses_1 (cost=0.56..60389.61 rows=1436923 width=41)
Index Cond: (address = '\x3836653432356638366638636338393364373935343938303233343363373561'::bytea)
Run Code Online (Sandbox Code Playgroud)
慢查询的查询计划:
Limit (cost=411977.37..411978.97 rows=100 width=41)
-> Unique (cost=411977.37..433386.12 rows=1338843 width=41)
-> Sort (cost=411977.37..419113.62 rows=2854500 width=41)
Sort Key: time, item_id
-> Index Only Scan using item_codes_code_time_item_id_idx on item_codes (cost=0.56..105906.37 rows=2854500 width=41)
Index Cond: (code = ANY ('{"\\x3965623166306238383033393437613338373162313934383034366139653239","\\x3836653432356638366638636338393364373935343938303233343363373561"}'::bytea[]))
JIT:
Functions: 4
Options: Inlining false, Optimization false, Expressions true, Deforming true
Run Code Online (Sandbox Code Playgroud)
可能的。提供一组输入值,然后附加一个LATERAL
子查询:
SELECT DISTINCT time, item_id
FROM unnest('{\\x3965623166306238383033393437613338373162313934383034366139653239
, \\x3836653432356638366638636338393364373935343938303233343363373561}'::bytea[]) c(code)
CROSS JOIN LATERAL (
SELECT time, item_id
FROM item_codes ic
WHERE ic.code = c.code
ORDER BY 1, 2
LIMIT 100
) ic
ORDER BY 1, 2
LIMIT 100;
Run Code Online (Sandbox Code Playgroud)
我取消嵌套输入数组以提供该集合。顺便说一句:在数组文字内部转义\
,\
或者使用ARRAY 构造函数代替:
ARRAY['\x3965623166306238383033393437613338373162313934383034366139653239'
, '\x3836653432356638366638636338393364373935343938303233343363373561']::bytea[]
Run Code Online (Sandbox Code Playgroud)
或者:
ARRAY['\x3965623166306238383033393437613338373162313934383034366139653239'::bytea
, '\x3836653432356638366638636338393364373935343938303233343363373561']
Run Code Online (Sandbox Code Playgroud)
或者,VALUES
表达式也可以达到目的:
SELECT DISTINCT time, item_id
FROM (
VALUES
('\x3965623166306238383033393437613338373162313934383034366139653239'::bytea)
, ('\x3836653432356638366638636338393364373935343938303233343363373561')
) c(code)
CROSS JOIN LATERAL ( ...
Run Code Online (Sandbox Code Playgroud)
因为 Postgres 还没有将这种索引跳过扫描实现为查询计划。所以我们必须把它硬塞进去。
有关的: