pgsql - 从不在的大表中删除数据

las*_*weq 3 postgresql performance postgresql-performance

我需要从一个大表中删除一些行。要删除的行不应在另一个表中,例如:

DELETE FROM LargeTable WHERE id IS NOT IN (SELECT DISTINCT foreign_id from EvenLargerTable)

但是我的服务器无法处理这种生硬的查询,因为 LargeTable 中几乎有一百万条记录,而 EvenLargerTable 中有几百万条记录

我该如何解决?

Abe*_*sto 5

对于用于存在性测试的大表,NOT EXISTS通常工作得更快NOT IN。所以试试

DELETE FROM LargeTable
WHERE NOT EXISTS (
  SELECT *
  FROM EvenLargerTable
  WHERE EvenLargerTable.foreign_id = LargeTable.id);
Run Code Online (Sandbox Code Playgroud)

在没有实际数据的情况下,很难解释为什么会发生这种情况。但是通过简单的操场,我们可以看到NOT INcase 没有使用索引来执行操作:

操场:

-- drop table if exists a;
-- drop table if exists b;
create table a as select (random()*1000)::int as x from generate_series(1,10000);
create index idx_a on a(x);

create table b as select (random()*1000)::int*10 as x from generate_series(1,1000000);
create index idx_b on b(x);

analyse a;
analyse b;
Run Code Online (Sandbox Code Playgroud)

测试:

nd@postgres=# explain (verbose) delete from a where a.x not in (select b.x from b);
????????????????????????????????????????????????????????????????????????????????????
?                                    QUERY PLAN                                    ?
????????????????????????????????????????????????????????????????????????????????????
? Delete on nd.a  (cost=0.00..129160170.00 rows=5000 width=6)                      ?
?   ->  Seq Scan on nd.a  (cost=0.00..129160170.00 rows=5000 width=6)              ?
?         Output: a.ctid                                                           ?
?         Filter: (NOT (SubPlan 1))                                                ?
?         SubPlan 1                                                                ?
?           ->  Materialize  (cost=0.00..23332.00 rows=1000000 width=4)            ?
?                 Output: b.x                                                      ?
?                 ->  Seq Scan on nd.b  (cost=0.00..14425.00 rows=1000000 width=4) ?
?                       Output: b.x                                                ?
????????????????????????????????????????????????????????????????????????????????????

nd@postgres=# explain (verbose) delete from a where not exists (select * from b where a.x=b.x);
????????????????????????????????????????????????????????????????????????????????????
?                                    QUERY PLAN                                    ?
????????????????????????????????????????????????????????????????????????????????????
? Delete on nd.a  (cost=0.42..5005.91 rows=1 width=12)                             ?
?   ->  Nested Loop Anti Join  (cost=0.42..5005.91 rows=1 width=12)                ?
?         Output: a.ctid, b.ctid                                                   ?
?         ->  Seq Scan on nd.a  (cost=0.00..145.00 rows=10000 width=10)            ?
?               Output: a.ctid, a.x                                                ?
?         ->  Index Scan using idx_b on nd.b  (cost=0.42..20.78 rows=999 width=10) ?
?               Output: b.ctid, b.x                                                ?
?               Index Cond: (a.x = b.x)                                            ?
????????????????????????????????????????????????????????????????????????????????????
Run Code Online (Sandbox Code Playgroud)