gue*_*tli 2 postgresql performance join
到目前为止,我的 PostgreSQL 应用程序中的所有查询都很快。在过去的几天里,有时一个查询需要几个小时。索引是健康的,因为在 dump、restore、vacuum -z 之后,它仍然是一样的:查询需要几个小时。
EXPLAIN VERBOSE SELECT DISTINCT "modwork_beleg"."direction"
FROM "modwork_beleg"
LEFT OUTER JOIN "modwork_isu_isu_pkunde" ON ("modwork_beleg"."id" = "modwork_isu_isu_pkunde"."beleg_id")
LEFT OUTER JOIN "modwork_isu_isu_vknr" ON ("modwork_beleg"."id" = "modwork_isu_isu_vknr"."beleg_id")
LEFT OUTER JOIN "modwork_isu_isu_gpnr" ON ("modwork_beleg"."id" = "modwork_isu_isu_gpnr"."beleg_id")
WHERE ("modwork_beleg"."state" IN (E'neu', E'inarbeit', E'wiedervorlage')
AND (
"modwork_isu_isu_pkunde"."pkunde" = 90237758
OR
"modwork_isu_isu_vknr"."vknr" = 254400297729
OR
"modwork_isu_isu_gpnr"."gpnr" = 1001030921
));
Run Code Online (Sandbox Code Playgroud)
查询计划:
HashAggregate (cost=36409.29..36409.30 rows=1 width=3)
Output: modwork_beleg.direction
-> Merge Right Join (cost=28836.70..36409.29 rows=1 width=3)
Output: modwork_beleg.direction
Merge Cond: (modwork_isu_isu_vknr.beleg_id = modwork_beleg.id)
Filter: ((modwork_isu_isu_pkunde.pkunde = 90237758) OR (modwork_isu_isu_vknr.vknr = 254400297729::bigint) OR (modwork_isu_isu_gpnr.gpnr = 1001030921))
-> Index Scan using modwork_isu_isu_vknr_beleg_id on modwork_isu_isu_vknr (cost=0.00..6463.55 rows=203811 width=12)
Output: modwork_isu_isu_vknr.id, modwork_isu_isu_vknr.beleg_id, modwork_isu_isu_vknr.vknr
-> Sort (cost=28836.70..28876.79 rows=16037 width=19)
Output: modwork_beleg.direction, modwork_beleg.id, modwork_isu_isu_gpnr.gpnr, modwork_isu_isu_pkunde.pkunde
Sort Key: modwork_beleg.id
-> Merge Right Join (cost=21128.57..27716.58 rows=16037 width=19)
Output: modwork_beleg.direction, modwork_beleg.id, modwork_isu_isu_gpnr.gpnr, modwork_isu_isu_pkunde.pkunde
Merge Cond: (modwork_isu_isu_gpnr.beleg_id = modwork_beleg.id)
-> Index Scan using modwork_isu_isu_gpnr_beleg_id on modwork_isu_isu_gpnr (cost=0.00..5883.73 rows=185491 width=12)
Output: modwork_isu_isu_gpnr.id, modwork_isu_isu_gpnr.beleg_id, modwork_isu_isu_gpnr.gpnr
-> Sort (cost=21128.57..21157.69 rows=11646 width=11)
Output: modwork_beleg.direction, modwork_beleg.id, modwork_isu_isu_pkunde.pkunde
Sort Key: modwork_beleg.id
-> Merge Right Join (cost=14555.56..20342.03 rows=11646 width=11)
Output: modwork_beleg.direction, modwork_beleg.id, modwork_isu_isu_pkunde.pkunde
Merge Cond: (modwork_isu_isu_pkunde.beleg_id = modwork_beleg.id)
-> Index Scan using modwork_isu_isu_pkunde_beleg_id on modwork_isu_isu_pkunde (cost=0.00..5203.79 rows=163197 width=8)
Output: modwork_isu_isu_pkunde.id, modwork_isu_isu_pkunde.beleg_id, modwork_isu_isu_pkunde.pkunde
-> Sort (cost=14555.56..14573.39 rows=7134 width=7)
Output: modwork_beleg.direction, modwork_beleg.id
Sort Key: modwork_beleg.id
-> Bitmap Heap Scan on modwork_beleg (cost=140.06..14098.97 rows=7134 width=7)
Output: modwork_beleg.direction, modwork_beleg.id
Recheck Cond: ((state)::text = ANY ('{neu,inarbeit,wiedervorlage}'::text[]))
-> Bitmap Index Scan on modwork_beleg_state (cost=0.00..138.28 rows=7134 width=0)
Index Cond: ((state)::text = ANY ('{neu,inarbeit,wiedervorlage}'::text[]))
(32 rows)
Run Code Online (Sandbox Code Playgroud)
我的 PostgreSQL 版本:
modwork_egs_q=> select version();
version
----------------------------------------------------------------------------------------------------------------------------------------
PostgreSQL 8.4.8 on x86_64-unknown-linux-gnu, compiled by GCC gcc (SUSE Linux) 4.5.0 20100604 [gcc-4_5-branch revision 160292], 64-bit
Run Code Online (Sandbox Code Playgroud)
Erw*_*ter 10
首先,我大大简化了您的语法以使其更具可读性:
SELECT DISTINCT b.direction
FROM modwork_beleg b
LEFT JOIN modwork_isu_isu_pkunde p ON b.id = p.beleg_id
LEFT JOIN modwork_isu_isu_vknr v ON b.id = v.beleg_id
LEFT JOIN modwork_isu_isu_gpnr g ON b.id = g.beleg_id
WHERE b.state IN ('neu', 'inarbeit', 'wiedervorlage')
AND (p.pkunde = 90237758
OR v.vknr = 254400297729
OR g.gpnr = 1001030921);
Run Code Online (Sandbox Code Playgroud)
接下来,这是关于性能的重要步骤:
SELECT b.direction
FROM modwork_beleg b
JOIN modwork_isu_isu_pkunde p ON b.id = p.beleg_id
WHERE b.state IN ('neu', 'inarbeit', 'wiedervorlage')
AND p.pkunde = 90237758
UNION
SELECT b.direction
FROM modwork_beleg b
JOIN modwork_isu_isu_vknr v ON b.id = v.beleg_id
WHERE b.state IN ('neu', 'inarbeit', 'wiedervorlage')
AND v.vknr = 254400297729
UNION
SELECT b.direction
FROM modwork_beleg b
JOIN modwork_isu_isu_gpnr g ON b.id = g.beleg_id
WHERE b.state IN ('neu', 'inarbeit', 'wiedervorlage')
AND g.gpnr = 1001030921;
Run Code Online (Sandbox Code Playgroud)
关键是,在您的原始查询中,您从所有可能的组合中形成一个大表。如果modwork_isu_isu_*
每个表都有 100 行beleg_id
,这将导致 100x100x100 = 100 万行的巨大表,包含所有四个表的所有列。然后您选择其中的几个(您的查询计划显示 32 个结果行)。这是非常低效的。
好消息是,您的查询很容易分为三个部分。我没有测试,但我敢打赌这会快几个数量级。
我还将连接类型从 更改LEFT [OUTER] JOIN
为[INNER] JOIN
,因为对于三个单独的查询,右表上的条件使每个查询都JOIN
有效。
最后,在组合三个查询的结果时使用UNION
而不是UNION ALL
删除 中的重复值b.direction
。所以我可以删除DISTINCT
每个查询的冗余。一切都更简单、更快。