las*_*weq 3 postgresql performance postgresql-performance
我需要从一个大表中删除一些行。要删除的行不应在另一个表中,例如:
DELETE FROM LargeTable WHERE id IS NOT IN (SELECT DISTINCT foreign_id from EvenLargerTable)
但是我的服务器无法处理这种生硬的查询,因为 LargeTable 中几乎有一百万条记录,而 EvenLargerTable 中有几百万条记录
我该如何解决?
对于用于存在性测试的大表,NOT EXISTS
通常工作得更快NOT IN
。所以试试
DELETE FROM LargeTable
WHERE NOT EXISTS (
SELECT *
FROM EvenLargerTable
WHERE EvenLargerTable.foreign_id = LargeTable.id);
Run Code Online (Sandbox Code Playgroud)
在没有实际数据的情况下,很难解释为什么会发生这种情况。但是通过简单的操场,我们可以看到NOT IN
case 没有使用索引来执行操作:
操场:
-- drop table if exists a;
-- drop table if exists b;
create table a as select (random()*1000)::int as x from generate_series(1,10000);
create index idx_a on a(x);
create table b as select (random()*1000)::int*10 as x from generate_series(1,1000000);
create index idx_b on b(x);
analyse a;
analyse b;
Run Code Online (Sandbox Code Playgroud)
测试:
nd@postgres=# explain (verbose) delete from a where a.x not in (select b.x from b);
????????????????????????????????????????????????????????????????????????????????????
? QUERY PLAN ?
????????????????????????????????????????????????????????????????????????????????????
? Delete on nd.a (cost=0.00..129160170.00 rows=5000 width=6) ?
? -> Seq Scan on nd.a (cost=0.00..129160170.00 rows=5000 width=6) ?
? Output: a.ctid ?
? Filter: (NOT (SubPlan 1)) ?
? SubPlan 1 ?
? -> Materialize (cost=0.00..23332.00 rows=1000000 width=4) ?
? Output: b.x ?
? -> Seq Scan on nd.b (cost=0.00..14425.00 rows=1000000 width=4) ?
? Output: b.x ?
????????????????????????????????????????????????????????????????????????????????????
nd@postgres=# explain (verbose) delete from a where not exists (select * from b where a.x=b.x);
????????????????????????????????????????????????????????????????????????????????????
? QUERY PLAN ?
????????????????????????????????????????????????????????????????????????????????????
? Delete on nd.a (cost=0.42..5005.91 rows=1 width=12) ?
? -> Nested Loop Anti Join (cost=0.42..5005.91 rows=1 width=12) ?
? Output: a.ctid, b.ctid ?
? -> Seq Scan on nd.a (cost=0.00..145.00 rows=10000 width=10) ?
? Output: a.ctid, a.x ?
? -> Index Scan using idx_b on nd.b (cost=0.42..20.78 rows=999 width=10) ?
? Output: b.ctid, b.x ?
? Index Cond: (a.x = b.x) ?
????????????????????????????????????????????????????????????????????????????????????
Run Code Online (Sandbox Code Playgroud)
归档时间: |
|
查看次数: |
2166 次 |
最近记录: |