为什么使用联接的查询比使用子查询慢得多?

Mas*_*gol 5 postgresql join subquery query-optimization

我很惊讶我使用的查询join比使用子查询慢了大约 350 倍。通常,我更喜欢使用 join,因为它更具可读性。但是,速度上的差距实在是太大了。

这是数据定义。这些表直接从源代码中提取(除了下面注释的那些行)。

-- possible account type for char of accounts below.
create type account_kind as enum (
  'asset',
  'liability',
  'equity',
  'income',
  'expense'
);

-- chart of accounts represented as a tree.
create table account(
  id        integer      not null,
  name      text         not null,
  kind      account_kind not null,
  parent_id integer,

  constraint account_pkey primary key (id),
  constraint account_name_length_check check (length(name) between 1 and 200),
  constraint account_parent_id_fkey foreign key (parent_id) references account(id) on update cascade on delete restrict,
  constraint account_parent_id_name_uniq unique (parent_id, name),
  constraint account_parent_id_cycle_check check (parent_id <> id)
);

-- a journal represent a single transaction.
-- it always accompanied by at least two posting.
-- at application level, a journal without at least
-- two posting is always rejected.
create table journal(
  id       integer     not null generated always as identity,
--  owner_id integer     not null,
  time     timestamptz not null default current_timestamp,
  text     text        not null,
  amount   numeric     not null,

  constraint journal_pkey primary key (id),
--  constraint journal_owner_id_fkey foreign key (owner_id) references owner(id) on update cascade on delete restrict,
  constraint journal_text_length_check check (length(text) between 1 and 1000),
  constraint journal_amount_sign_check check (amount > 0)
);

-- index on `journal(time asc, id asc)` and `journal(time desc, id desc)` below
-- are used for cursor based pagination. because ordering only using `id` is
-- not meaningful. and `time` is not always in the same order as `id`.

-- create index journal_owner_id_idx on journal(owner_id);
create index journal_time_idx on journal(time);
-- create index journal_text_trgm on journal using gin (text gin_trgm_ops);
create index journal_time_id_asc_idx on journal(time asc, id asc);
create index journal_time_id_desc_idx on journal(time desc, id desc);

-- a posting represent a single entry on journal. this is an implementation
-- of a double entry bookkeeping where postive amount represent a debit entry
-- and negative amount represent a credit entry. this is why zero amount
-- is not allowed.
create table posting(
  id          integer not null generated always as identity,
  journal_id  integer not null,
  account_id  integer not null,
  amount      numeric not null,
  note        text,

  constraint posting_pkey primary key (id),
  constraint posting_journal_id_fkey foreign key (journal_id) references journal(id) on update cascade on delete restrict,
  constraint posting_account_id_fkey foreign key (account_id) references account(id) on update cascade on delete restrict,
  constraint posting_amount_non_zero_check check (amount <> 0),
  constraint posting_note_length_check check (length(note) between 1 and 200)
);

-- don't forget to index foreign keys.
create index posting_journal_id_idx on posting(journal_id);
create index posting_account_id_idx on posting(account_id);
Run Code Online (Sandbox Code Playgroud)

现在,journal表包含43401行,posting表包含86802行。

我查询的目的是根据journal表从所有这三个表中获取数据。

JOIN版本

select journal.id,
       journal.time,
       journal.text,
       journal.amount,
       jsonb_agg(jsonb_build_object(
         'id', posting.id,
         'note', posting.note,
         'amount', posting.amount,
         'account', jsonb_build_object(
           'id', account.id,
           'name', account.name,
           'kind', account.kind,
           'parent_id', account.parent_id
       )) order by posting.id) as postings
  from posting
  join journal on journal.id = posting.journal_id
  join account on account.id = posting.account_id
 where (journal.time, journal.id) >= ('2021-06-01 05:00:00 +00', 2154)
 group by journal.id
 order by journal.time asc, journal.id asc
 limit 100;
Run Code Online (Sandbox Code Playgroud)

子查询版本

select journal.id,
       journal.time,
       journal.text,
       journal.amount,
       (select jsonb_agg(jsonb_build_object(
                 'id', posting.id,
                 'note', posting.note,
                 'amount', posting.amount,
                 'account', jsonb_build_object(
                   'id', account.id,
                   'name', account.name,
                   'kind', account.kind,
                   'parent_id', account.parent_id
               )) order by posting.id)
          from posting
          join account on posting.account_id = account.id
         where posting.journal_id = journal.id
       ) as postings
  from journal
 where (journal.time, journal.id) >= ('2021-06-01 05:00:00 +00', 2154)
 order by journal.time asc, journal.id asc
 limit 100;
Run Code Online (Sandbox Code Playgroud)

这些查询产生完全相同的结果。如果有一个journal不伴随的,实际上可能会产生不同的结果posting。但是,我认为那是另一个故事了。

针对真实数据库运行explain analyze该版本的输出为:JOIN

 Limit  (cost=10252.34..10252.59 rows=100 width=92) (actual time=1293.789..1294.139 rows=100 loops=1)
   ->  Sort  (cost=10252.34..10350.51 rows=39268 width=92) (actual time=1293.787..1293.917 rows=100 loops=1)
         Sort Key: journal."time", journal.id
         Sort Method: top-N heapsort  Memory: 76kB
         ->  GroupAggregate  (cost=0.73..8751.54 rows=39268 width=92) (actual time=18.876..1244.314 rows=39246 loops=1)
               Group Key: journal.id
               ->  Nested Loop  (cost=0.73..7475.33 rows=78536 width=145) (actual time=18.825..764.928 rows=78492 loops=1)
                     ->  Merge Join  (cost=0.58..5571.84 rows=78536 width=105) (actual time=18.811..396.991 rows=78492 loops=1)
                           Merge Cond: (posting.journal_id = journal.id)
                           ->  Index Scan using posting_journal_id_idx on posting  (cost=0.29..2722.32 rows=86802 width=49) (actual time=0.009..107.108 rows=86802 loops=1)
                           ->  Index Scan using journal_pkey on journal  (cost=0.29..1758.81 rows=39268 width=60) (actual time=0.399..51.562 rows=39246 loops=1)
                                 Filter: ("time" >= '2021-06-01 00:00:00+07'::timestamp with time zone)
                                 Rows Removed by Filter: 4155
                     ->  Memoize  (cost=0.15..0.17 rows=1 width=44) (actual time=0.001..0.001 rows=1 loops=78492)
                           Cache Key: posting.account_id
                           Cache Mode: logical
                           Hits: 78484  Misses: 8  Evictions: 0  Overflows: 0  Memory Usage: 1kB
                           ->  Index Scan using account_pkey on account  (cost=0.14..0.16 rows=1 width=44) (actual time=0.003..0.003 rows=1 loops=8)
                                 Index Cond: (id = posting.account_id)
 Planning Time: 0.207 ms
 Execution Time: 1294.282 ms
Run Code Online (Sandbox Code Playgroud)

子查询版本的输出是:

 Limit  (cost=0.29..981.68 rows=100 width=92) (actual time=0.191..3.437 rows=100 loops=1)
   ->  Index Scan Backward using journal_time_id_desc_idx on journal  (cost=0.29..385372.47 rows=39268 width=92) (actual time=0.187..3.203 rows=100 loops=1)
         Index Cond: ("time" >= '2021-06-01 00:00:00+07'::timestamp with time zone)
         SubPlan 1
           ->  Aggregate  (cost=9.73..9.74 rows=1 width=32) (actual time=0.026..0.028 rows=1 loops=100)
                 ->  Hash Join  (cost=1.67..9.72 rows=2 width=85) (actual time=0.005..0.012 rows=2 loops=100)
                       Hash Cond: (posting.account_id = account.id)
                       ->  Index Scan using posting_journal_id_idx on posting  (cost=0.29..8.33 rows=2 width=45) (actual time=0.002..0.005 rows=2 loops=100)
                             Index Cond: (journal_id = journal.id)
                       ->  Hash  (cost=1.17..1.17 rows=17 width=44) (actual time=0.053..0.058 rows=17 loops=1)
                             Buckets: 1024  Batches: 1  Memory Usage: 9kB
                             ->  Seq Scan on account  (cost=0.00..1.17 rows=17 width=44) (actual time=0.006..0.027 rows=17 loops=1)
 Planning Time: 0.615 ms
 Execution Time: 3.598 ms
Run Code Online (Sandbox Code Playgroud)

正如您所看到的,差异Execution time要慢一百倍 (1294.282 ms3.598 ms)。

为了这个问题的目的,我尝试用虚拟数据(50000行journal和100000行posting)模拟查询:

 Limit  (cost=10252.34..10252.59 rows=100 width=92) (actual time=1293.789..1294.139 rows=100 loops=1)
   ->  Sort  (cost=10252.34..10350.51 rows=39268 width=92) (actual time=1293.787..1293.917 rows=100 loops=1)
         Sort Key: journal."time", journal.id
         Sort Method: top-N heapsort  Memory: 76kB
         ->  GroupAggregate  (cost=0.73..8751.54 rows=39268 width=92) (actual time=18.876..1244.314 rows=39246 loops=1)
               Group Key: journal.id
               ->  Nested Loop  (cost=0.73..7475.33 rows=78536 width=145) (actual time=18.825..764.928 rows=78492 loops=1)
                     ->  Merge Join  (cost=0.58..5571.84 rows=78536 width=105) (actual time=18.811..396.991 rows=78492 loops=1)
                           Merge Cond: (posting.journal_id = journal.id)
                           ->  Index Scan using posting_journal_id_idx on posting  (cost=0.29..2722.32 rows=86802 width=49) (actual time=0.009..107.108 rows=86802 loops=1)
                           ->  Index Scan using journal_pkey on journal  (cost=0.29..1758.81 rows=39268 width=60) (actual time=0.399..51.562 rows=39246 loops=1)
                                 Filter: ("time" >= '2021-06-01 00:00:00+07'::timestamp with time zone)
                                 Rows Removed by Filter: 4155
                     ->  Memoize  (cost=0.15..0.17 rows=1 width=44) (actual time=0.001..0.001 rows=1 loops=78492)
                           Cache Key: posting.account_id
                           Cache Mode: logical
                           Hits: 78484  Misses: 8  Evictions: 0  Overflows: 0  Memory Usage: 1kB
                           ->  Index Scan using account_pkey on account  (cost=0.14..0.16 rows=1 width=44) (actual time=0.003..0.003 rows=1 loops=8)
                                 Index Cond: (id = posting.account_id)
 Planning Time: 0.207 ms
 Execution Time: 1294.282 ms
Run Code Online (Sandbox Code Playgroud)

使用模拟数据运行explain analyze给出的结果略有不同。

join版本:

 Limit  (cost=4248.40..4248.65 rows=100 width=108) (actual time=927.635..927.991 rows=100 loops=1)
   ->  Sort  (cost=4248.40..4275.50 rows=10842 width=108) (actual time=927.632..927.769 rows=100 loops=1)
         Sort Key: journal."time", journal.id
         Sort Method: top-N heapsort  Memory: 76kB
         ->  GroupAggregate  (cost=3491.47..3834.02 rows=10842 width=108) (actual time=534.029..895.869 rows=25175 loops=1)
               Group Key: journal.id
               ->  Sort  (cost=3491.47..3532.88 rows=16562 width=188) (actual time=533.982..594.864 rows=50350 loops=1)
                     Sort Key: journal.id
                     Sort Method: external merge  Disk: 4592kB
                     ->  Hash Join  (cost=1022.89..2330.84 rows=16562 width=188) (actual time=176.058..467.650 rows=50350 loops=1)
                           Hash Cond: (posting.account_id = account.id)
                           ->  Hash Join  (cost=987.47..2251.78 rows=16562 width=148) (actual time=176.014..351.375 rows=50350 loops=1)
                                 Hash Cond: (posting.journal_id = journal.id)
                                 ->  Seq Scan on posting  (cost=0.00..1133.86 rows=49686 width=76) (actual time=0.004..114.328 rows=100000 loops=1)
                                 ->  Hash  (cost=851.94..851.94 rows=10842 width=76) (actual time=61.709..61.716 rows=25175 loops=1)
                                       Buckets: 32768 (originally 16384)  Batches: 1 (originally 1)  Memory Usage: 1977kB
                                       ->  Bitmap Heap Scan on journal  (cost=272.31..851.94 rows=10842 width=76) (actual time=0.553..31.249 rows=25175 loops=1)
                                             Filter: (ROW("time", id) >= ROW('2021-06-01 12:00:00+07'::timestamp with time zone, 2154))
                                             Heap Blocks: exact=211
                                             ->  Bitmap Index Scan on journal_time_idx  (cost=0.00..269.60 rows=10842 width=0) (actual time=0.533..0.535 rows=25175 loops=1)
                                                   Index Cond: ("time" >= '2021-06-01 12:00:00+07'::timestamp with time zone)
                           ->  Hash  (cost=21.30..21.30 rows=1130 width=44) (actual time=0.037..0.041 rows=13 loops=1)
                                 Buckets: 2048  Batches: 1  Memory Usage: 17kB
                                 ->  Seq Scan on account  (cost=0.00..21.30 rows=1130 width=44) (actual time=0.002..0.016 rows=13 loops=1)
 Planning Time: 0.120 ms
 Execution Time: 928.739 ms
Run Code Online (Sandbox Code Playgroud)

子查询版本:

 Limit  (cost=0.29..52688.73 rows=100 width=108) (actual time=0.133..3.602 rows=100 loops=1)
   ->  Index Scan using journal_time_id_asc_idx on journal  (cost=0.29..5712480.93 rows=10842 width=108) (actual time=0.129..3.370 rows=100 loops=1)
         Index Cond: (ROW("time", id) >= ROW('2021-06-01 12:00:00+07'::timestamp with time zone, 2154))
         SubPlan 1
           ->  Aggregate  (cost=526.68..526.69 rows=1 width=32) (actual time=0.029..0.030 rows=1 loops=100)
                 ->  Hash Join  (cost=45.64..524.82 rows=248 width=112) (actual time=0.008..0.015 rows=2 loops=100)
                       Hash Cond: (posting.account_id = account.id)
                       ->  Bitmap Heap Scan on posting  (cost=10.21..488.74 rows=248 width=72) (actual time=0.005..0.007 rows=2 loops=100)
                             Recheck Cond: (journal_id = journal.id)
                             Heap Blocks: exact=101
                             ->  Bitmap Index Scan on posting_journal_id_idx  (cost=0.00..10.15 rows=248 width=0) (actual time=0.002..0.002 rows=2 loops=100)
                                   Index Cond: (journal_id = journal.id)
                       ->  Hash  (cost=21.30..21.30 rows=1130 width=44) (actual time=0.042..0.046 rows=13 loops=1)
                             Buckets: 2048  Batches: 1  Memory Usage: 17kB
                             ->  Seq Scan on account  (cost=0.00..21.30 rows=1130 width=44) (actual time=0.002..0.019 rows=13 loops=1)
 Planning Time: 0.202 ms
 Execution Time: 3.757 ms
Run Code Online (Sandbox Code Playgroud)

区别在于928.739 ms3.757 ms几乎慢了 250 倍)。

我不明白。为什么差异如此显着?我的join查询有误吗?