gue*_*tli 5 performance subquery postgresql-9.3 except postgresql-performance
DROP TABLE IF EXISTS history;
CREATE TABLE history (
id integer NOT NULL,
ticket_id integer NOT NULL);
ALTER TABLE ONLY history ADD CONSTRAINT history_pkey PRIMARY KEY (id);
CREATE INDEX history_ticket_id ON history USING btree (ticket_id);
DROP TABLE IF EXISTS ticket;
CREATE TABLE ticket (
id integer NOT NULL
);
ALTER TABLE ONLY ticket ADD CONSTRAINT ticket_pkey PRIMARY KEY (id);
Run Code Online (Sandbox Code Playgroud)
INSERT INTO history values (generate_series(1, 30000), generate_series(1, 30000));
ANALYZE history;
INSERT INTO ticket values (generate_series(1, 40000));
ANALYZE ticket;
Run Code Online (Sandbox Code Playgroud)
explain analyze select distinct ticket_id from history
where ticket_id not in (select id from ticket);
Run Code Online (Sandbox Code Playgroud)
HashAggregate (cost=15510545.50..15510695.50 rows=15000 width=4) (actual time=170892.668..170892.668 rows=0 loops=1)
-> Seq Scan on history (cost=0.00..15510508.00 rows=15000 width=4) (actual time=170892.644..170892.644 rows=0 loops=1)
Filter: (NOT (SubPlan 1))
Rows Removed by Filter: 30000
SubPlan 1
-> Materialize (cost=0.00..934.00 rows=40000 width=4) (actual time=0.006..2.685 rows=15000 loops=30000)
-> Seq Scan on ticket (cost=0.00..577.00 rows=40000 width=4) (actual time=0.038..21.347 rows=30000 loops=1)
Total runtime: 170892.965 ms
Run Code Online (Sandbox Code Playgroud)
explain analyze select distinct ticket_id from history
except select id from ticket;
Run Code Online (Sandbox Code Playgroud)
HashSetOp Except (cost=0.29..2449.29 rows=30000 width=4) (actual time=41.641..41.641 rows=0 loops=1)
-> Append (cost=0.29..2274.29 rows=70000 width=4) (actual time=0.024..27.835 rows=70000 loops=1)
-> Subquery Scan on "*SELECT* 1" (cost=0.29..1297.29 rows=30000 width=4) (actual time=0.024..14.527 rows=30000 loops=1)
-> Unique (cost=0.29..997.29 rows=30000 width=4) (actual time=0.022..10.856 rows=30000 loops=1)
-> Index Only Scan using history_ticket_id on history (cost=0.29..922.29 rows=30000 width=4) (actual time=0.021..6.031 rows=30000 loops=1)
Heap Fetches: 30000
-> Subquery Scan on "*SELECT* 2" (cost=0.00..977.00 rows=40000 width=4) (actual time=0.018..8.364 rows=40000 loops=1)
-> Seq Scan on ticket (cost=0.00..577.00 rows=40000 width=4) (actual time=0.018..3.808 rows=40000 loops=1)
Total runtime: 41.702 ms
Run Code Online (Sandbox Code Playgroud)
小智 1
in
对于常量值列表更好。尝试使用not exists
替代。
询问:
explain analyze select distinct ticket_id from history h
where not EXISTS (select id from ticket t where t.id = h.ticket_id);
Run Code Online (Sandbox Code Playgroud)
以及执行计划:
Unique (cost=0.58..2294.04 rows=1 width=4) (actual time=23.140..23.140 rows=0 loops=1)
-> Merge Anti Join (cost=0.58..2294.04 rows=1 width=4) (actual time=23.139..23.139 rows=0 loops=1)
Merge Cond: (h.ticket_id = t.id)
-> Index Only Scan using history_ticket_id on history h (cost=0.29..922.29 rows=30000 width=4) (actual time=0.037..6.848 rows=30000 loops=1)
Heap Fetches: 30000
-> Index Only Scan using ticket_pkey on ticket t (cost=0.29..1228.29 rows=40000 width=4) (actual time=0.026..6.970 rows=30000 loops=1)
Heap Fetches: 30000
Total runtime: 23.189 ms
Run Code Online (Sandbox Code Playgroud)
我认为原因是NOT IN
Postgres 需要从表中构建不同的值列表ticket
,然后仅过滤history
。
NOT EXISTS
不需要创建列表。它只能检查门票 PK 索引中是否存在值。
通常,当您在此类查询中没有得到“Anti Join”时 - 有些东西写得很糟糕。