如何通过许多重复的 UNION 子查询来减少查询大小?

Vit*_*nko 6 postgresql execution-plan union query-performance postgresql-performance

我使用 Postgres 13 并使用以下 DDL 定义了一个表:

CREATE TABLE item_codes (
    code    bytea                    NOT NULL,
    item_id bytea                    NOT NULL,
    time    TIMESTAMP WITH TIME ZONE NOT NULL,
    PRIMARY KEY (item_id, code)
);

CREATE INDEX ON item_codes (code, time, item_id);
Run Code Online (Sandbox Code Playgroud)

我使用以下查询:

SELECT DISTINCT time, item_id
FROM (
      (SELECT time, item_id
       FROM item_codes
       WHERE code = '\x3965623166306238383033393437613338373162313934383034366139653239'
       ORDER BY time, item_id
       LIMIT 100)
       UNION ALL
      (SELECT time, item_id
       FROM item_codes
       WHERE code = '\x3836653432356638366638636338393364373935343938303233343363373561'
       ORDER BY time, item_id
       LIMIT 100)
     ) AS items
ORDER BY time, item_id
LIMIT 100;
Run Code Online (Sandbox Code Playgroud)

代码是动态生成的,子查询的数量取决于需要UNION ALL多少个不同的值。code它可能会变得很长。

天真地将查询重写为我认为等效的

SELECT DISTINCT time, item_id
FROM item_codes
WHERE code IN ('\x3965623166306238383033393437613338373162313934383034366139653239',
                  '\x3836653432356638366638636338393364373935343938303233343363373561')
ORDER BY time, item_id
LIMIT 100
Run Code Online (Sandbox Code Playgroud)

使它慢很多倍且难以接受。

两个主要问题:

  1. 是否可以以更简洁的方式重写原始查询,而无需为每个code值重复子查询,同时仍保持快速执行计划?

  2. 为什么Postrgres不能优化第二个查询?我是否遗漏了一些东西并且它不是等价物?

带 s 的原始查询的查询计划UNION

Limit  (cost=1.12..7.33 rows=100 width=41)
       ->  Merge Append  (cost=1.12..13.53 rows=200 width=41)
        Sort Key: btc_tx_addresses.tx_time, btc_tx_addresses.tx_id"
        ->  Limit  (cost=0.56..4.76 rows=100 width=41)
              ->  Index Only Scan using btc_tx_addresses_address_tx_time_tx_id_idx on btc_tx_addresses  (cost=0.56..59576.94 rows=1417576 width=41)
                    Index Cond: (address = '\x3965623166306238383033393437613338373162313934383034366139653239'::bytea)
        ->  Limit  (cost=0.56..4.76 rows=100 width=41)
              ->  Index Only Scan using btc_tx_addresses_address_tx_time_tx_id_idx on btc_tx_addresses btc_tx_addresses_1  (cost=0.56..60389.61 rows=1436923 width=41)
                    Index Cond: (address = '\x3836653432356638366638636338393364373935343938303233343363373561'::bytea)
Run Code Online (Sandbox Code Playgroud)

慢查询的查询计划:

Limit  (cost=411977.37..411978.97 rows=100 width=41)
     ->  Unique  (cost=411977.37..433386.12 rows=1338843 width=41)
        ->  Sort  (cost=411977.37..419113.62 rows=2854500 width=41)
              Sort Key: time, item_id
              ->  Index Only Scan using item_codes_code_time_item_id_idx on item_codes  (cost=0.56..105906.37 rows=2854500 width=41)
                    Index Cond: (code = ANY ('{"\\x3965623166306238383033393437613338373162313934383034366139653239","\\x3836653432356638366638636338393364373935343938303233343363373561"}'::bytea[]))
JIT:
  Functions: 4
  Options: Inlining false, Optimization false, Expressions true, Deforming true
Run Code Online (Sandbox Code Playgroud)

Erw*_*ter 9

Q1. 不重复子查询的替代方案

可能的。提供一输入值,然后附加一个LATERAL子查询:

SELECT DISTINCT time, item_id
FROM   unnest('{\\x3965623166306238383033393437613338373162313934383034366139653239
              , \\x3836653432356638366638636338393364373935343938303233343363373561}'::bytea[]) c(code)
CROSS  JOIN LATERAL (
   SELECT time, item_id
   FROM   item_codes ic
   WHERE  ic.code = c.code
   ORDER  BY 1, 2
   LIMIT  100
   ) ic
ORDER  BY 1, 2
LIMIT  100;
Run Code Online (Sandbox Code Playgroud)

我取消嵌套输入数组以提供该集合。顺便说一句:在数组文字内部转义\\或者使用ARRAY 构造函数代替:

ARRAY['\x3965623166306238383033393437613338373162313934383034366139653239'
    , '\x3836653432356638366638636338393364373935343938303233343363373561']::bytea[]
Run Code Online (Sandbox Code Playgroud)

或者:

ARRAY['\x3965623166306238383033393437613338373162313934383034366139653239'::bytea
    , '\x3836653432356638366638636338393364373935343938303233343363373561']
Run Code Online (Sandbox Code Playgroud)

或者,VALUES表达式也可以达到目的:

SELECT DISTINCT time, item_id
FROM  (
   VALUES
     ('\x3965623166306238383033393437613338373162313934383034366139653239'::bytea)
   , ('\x3836653432356638366638636338393364373935343938303233343363373561')
   ) c(code)
CROSS  JOIN LATERAL ( ...
Run Code Online (Sandbox Code Playgroud)

Q2。为什么?

因为 Postgres 还没有将这种索引跳过扫描实现为查询计划。所以我们必须把它硬塞进去。

有关的:

  • 多谢!该查询与使用 UNION 的查询一样快。由于某种原因,具有 unnest 的版本不起作用并返回空结果。看起来数组元素不被视为十六进制文字,而是被视为字符串值。不过,带有 VALUES 的版本效果很好。 (2认同)