spe*_*eps 5 postgresql pgadmin execution-plan postgresql-9.5
我有一系列遵循一般模式的更新语句:一次更新聚合来自另一个表(或有时是多个表)的值,下一次更新根据聚合值生成排名。对于总共 46 个更新语句,此过程重复 23 次。每个更新对独立运行需要 30-40 秒,但是当我通过 PgAdmin 将它们作为单个事务一起运行时,它需要一个多小时,而不是我期望的基于单个查询时间的约 15 分钟。(上次尝试时,我最终停止执行并单独运行它们。)
如果我通过 psql 在文件中运行相同的更新集,则该过程将在预期的 15 分钟时间内完成。
查询计划器是否有一些怪癖会根据在单个事务中运行的大量更新语句来更改执行计划?鉴于 psql 和 PgAdmin 之间的不同行为,我认为这与查询打包执行的方式有关,但我不太熟悉,无法了解其中的区别。
有没有办法编写我的代码,以便在通过 PgAdmin 作为单个事务运行时提高性能?
我在 Ubuntu 16.04 上使用 PostgreSQL 9.5。
以下是代码中的两个示例对联:
-- bike_driver_aggressive
UPDATE generated.crash_aggregates
SET bike_driver_aggressive = (
SELECT COUNT(*)
FROM crashes_bike2 c
WHERE c.int_id = crash_aggregates.int_id
AND c.aggressive_driverfault
);
WITH ranks AS (
SELECT int_id,
rank() OVER (ORDER BY bike_driver_aggressive DESC) AS rank
FROM crash_aggregates
)
UPDATE generated.crash_aggregates
SET bike_driver_aggressive_rank = ranks.rank
FROM ranks
WHERE crash_aggregates.int_id = ranks.int_id;
-- bike_allinjury
UPDATE generated.crash_aggregates
SET bike_allinjury = (
SELECT COUNT(*)
FROM crashes_bike1 c
WHERE c.int_id = crash_aggregates.int_id
AND c.injurycrash
) + (
SELECT COUNT(*)
FROM crashes_bike2 c
WHERE c.int_id = crash_aggregates.int_id
AND c.injurycrash
);
WITH ranks AS (
SELECT int_id,
rank() OVER (ORDER BY bike_allinjury DESC) AS rank
FROM crash_aggregates
)
UPDATE generated.crash_aggregates
SET bike_allinjury_rank = ranks.rank
FROM ranks
WHERE crash_aggregates.int_id = ranks.int_id;
Run Code Online (Sandbox Code Playgroud)
crash_aggregates 表是这样创建的:
CREATE TABLE crash_aggregates
(
int_id integer NOT NULL,
geom geometry(Point,2231),
bike_driver_aggressive integer,
bike_driver_aggressive_rank integer,
bike_driver_failyield integer,
bike_driver_failyield_rank integer,
bike_driver_disregardsignal integer,
bike_driver_disregardsignal_rank integer,
bike_highspeed integer,
bike_highspeed_rank integer,
bike_biker_aggressive integer,
bike_biker_aggressive_rank integer,
bike_biker_failyield integer,
bike_biker_failyield_rank integer,
bike_biker_disregardsignal integer,
bike_biker_disregardsignal_rank integer,
bike_influence integer,
bike_influence_rank integer,
bike_driver_distracted integer,
bike_driver_distracted_rank integer,
bike_driver_reckless integer,
bike_driver_reckless_rank integer,
bike_tbone integer,
bike_tbone_rank integer,
bike_opp_lhook integer,
bike_opp_lhook_rank integer,
bike_samedir integer,
bike_samedir_rank integer,
bike_samedir_rhook1 integer,
bike_samedir_rhook1_rank integer,
bike_samedir_rhook2 integer,
bike_samedir_rhook2_rank integer,
bike_perp_rhook integer,
bike_perp_rhook_rank integer,
bike_perp_rhook_swalk1 integer,
bike_perp_rhook_swalk1_rank integer,
bike_perp_rhook_swalk2 integer,
bike_perp_rhook_swalk2_rank integer,
bike_tbone_swalk1 integer,
bike_tbone_swalk1_rank integer,
bike_tbone_swalk2 integer,
bike_tbone_swalk2_rank integer,
bike_allfatal integer,
bike_allfatal_rank integer,
bike_allinjury integer,
bike_allinjury_rank integer,
bike_injuryfatal integer,
bike_injuryfatal_rank integer,
bike_top10 integer,
bike_num1s integer,
bike_num2s integer,
bike_num3s integer,
bike_num4s integer,
bike_num5s integer,
bike_num6s integer,
bike_num7s integer,
bike_num8s integer,
bike_num9s integer,
bike_num10s integer,
CONSTRAINT crash_aggregates_pkey PRIMARY KEY (int_id)
);
Run Code Online (Sandbox Code Playgroud)
自行车 1 表:
CREATE TABLE received.crashes_bike1
(
caseid text,
year integer,
unittype_one text,
unittype_two text,
unittype_three text,
circumstance text,
primary_contrib text,
condition_1 text,
condition_2 text,
condition_3 text,
dirfromint text,
diroftravel_one text,
diroftravel_two text,
tdg_directions text,
dir_key integer,
directions text,
bike_mvmt text,
veh_mvmt text,
comb_mvmt text,
comb_mvmt_sw text,
map_code integer,
injury boolean,
diroftravel_three text,
disabled_st1 text,
disabled_st2 text,
contrib_1 text,
contrib_2 text,
contrib_3 text,
enteredby text,
entereddate text,
estvehspeed_one integer,
estvehspeed_two integer,
estvehspeed_three integer,
feetfromint integer,
firstharmful text,
mostharmful text,
secondharmful text,
internamedir text,
lightingcondition text,
location text,
masterid integer,
precrashmaneuv_1 text,
precrashmaneuv_2 text,
precrashmaneuv_3 text,
node integer,
numberinjured integer,
numberoffatalities integer,
pedaction_one text,
pedaction_two text,
pedaction_three text,
publicproperty text,
railroadcrossing text,
roadcondition text,
roadcontour text,
roaddescription text,
roadsurface text,
safetyequipmenthelmet_one text,
safetyequipmenthelmet_two text,
safetyequipmenthelmet_three text,
safetyequipsystem_one text,
safetyequipsystem_two text,
safetyequipsystem_three text,
safetyequipuse_one text,
safetyequipuse_two text,
safetyequipuse_three text,
speedlimit_one integer,
speedlimit_two integer,
speedlimit_three integer,
street1 integer,
street2 integer,
streetname_st1 text,
streetname_st2 text,
street_intersection text,
totvehs integer,
unitage_one integer,
unitage_two integer,
unitage_three integer,
vehcomb_one text,
vehcomb_two text,
vehcomb_three text,
vehicledefect_one text,
vehicledefect_two text,
vehicledefect_three text,
technicaljudgement text,
notes text,
typology text,
same_dir boolean,
opp_dir boolean,
perpen boolean,
angle boolean,
notes2 text,
sw text,
ww_sw text,
cw_dwy_alley text,
day_week text,
weekday text,
trail_access text,
bike_s_veh_s_st_p boolean,
bike_s_veh_lt_st_od boolean,
bike_s_veh_rt_st_p boolean,
bike_s_veh_rt_st_sd boolean,
bike_s_veh_s_st_sd boolean,
bike_s_veh_rt_st_ww_p boolean,
bike_s_veh_s_sw_ww_p boolean,
bike_s_veh_rt_sw_ww_p boolean,
highspeed boolean,
injurycrash boolean,
id integer NOT NULL DEFAULT nextval('crashes_bike1_id_seq'::regclass),
road_id1 integer,
road_id2 integer,
at_intersection boolean,
int_id integer,
CONSTRAINT crashes_bike1_pkey PRIMARY KEY (id)
);
CREATE INDEX idx_crashbike1bsvlso
ON received.crashes_bike1
USING btree
(bike_s_veh_lt_st_od);
CREATE INDEX idx_crashbike1bsvrsp
ON received.crashes_bike1
USING btree
(bike_s_veh_rt_st_p);
CREATE INDEX idx_crashbike1bsvrssd
ON received.crashes_bike1
USING btree
(bike_s_veh_rt_st_sd);
CREATE INDEX idx_crashbike1bsvrswp
ON received.crashes_bike1
USING btree
(bike_s_veh_rt_sw_ww_p);
CREATE INDEX idx_crashbike1bsvssp
ON received.crashes_bike1
USING btree
(bike_s_veh_s_st_p);
CREATE INDEX idx_crashbike1bsvsstsd
ON received.crashes_bike1
USING btree
(bike_s_veh_s_st_sd);
CREATE INDEX idx_crashbike1bsvsswwwp
ON received.crashes_bike1
USING btree
(bike_s_veh_s_sw_ww_p);
CREATE INDEX idx_crashbike1inj
ON received.crashes_bike1
USING btree
(injurycrash);
CREATE INDEX idx_crashbike1int
ON received.crashes_bike1
USING btree
(int_id);
Run Code Online (Sandbox Code Playgroud)
自行车2表:
CREATE TABLE received.crashes_bike2
(
accidentdate date,
accidenttime time without time zone,
adverseweather text,
appovertaketurn text,
caseid text,
constructionzone text,
contribfact_one text,
contribfact_two text,
contribfact_three text,
dirfromint text,
diroftravel_one text,
diroftravel_two text,
diroftravel_three text,
disabled_st1 text,
disabled_st2 text,
driveraction_one text,
driveraction_two text,
enteredby text,
entereddate text,
estvehspeed_one integer,
estvehspeed_two integer,
estvehspeed_three integer,
feetfromint integer,
firstharmful text,
mostharmful text,
secondharmful text,
internamedir text,
lightingcondition text,
location text,
masterid integer,
node integer,
numberinjured integer,
numberoffatalities integer,
roadcontour text,
roaddescription text,
roadsurface text,
rownum integer,
speedlimit_one integer,
speedlimit_two integer,
speedlimit_three integer,
street1 integer,
street2 integer,
streetname_st1 text,
streetname_st2 text,
totvehs integer,
unitage_one integer,
unitage_two integer,
unitage_three integer,
vehcomb_one text,
vehcomb_two text,
vehcomb_three text,
unittype_one text,
unittype_two text,
unittype_three text,
movement_one text,
movement_two text,
movement_three text,
circumstance text,
sw text,
othercw text,
motoristplacement text,
relationshipofplacements text,
wwswriding text,
bicyclelane text,
dooring text,
bicyclewwstreetriding text,
crashmonth integer,
crashday text,
hour time without time zone,
ridinglocation text,
crashyear integer,
injurycrash boolean,
fatalcrash boolean,
noinjuryfatality boolean,
unit1_veh boolean,
unit2_veh boolean,
unit1_bike boolean,
unit2_bike boolean,
intdistance_ft text,
hrgrp text,
bicyclist boolean,
bikeaction_one text,
bikeaction_two text,
newdriveraction_one text,
newdriveraction_two text,
bikeaction_one2 text,
bikeaction_two2 integer,
newbikeaction text,
newdriveraction_one2 text,
newdriveraction_two2 integer,
newdriveraction2 text,
bike_movement text,
driver_movement text,
unit1 text,
unit2 text,
unit3 text,
complex boolean,
nobike boolean,
bike_movement2 text,
bike_fault boolean,
driver_movement2 text,
crashtype text,
relationship text,
ww boolean,
xwalk boolean,
bikelane text,
sidewalk boolean,
location2 text,
direction text,
newcrashtype text,
bike_s_veh_s_st_p boolean,
bike_s_veh_lt_st_od boolean,
bike_s_veh_rt_st_p boolean,
bike_s_veh_rt_st_sd boolean,
bike_s_veh_s_st_sd boolean,
bike_s_veh_rt_st_ww_p boolean,
highspeed boolean,
atfault text,
influence boolean,
distracted_driverfault boolean,
aggressive_driverfault boolean,
inexperience_bikerfault boolean,
aggressive_bikerfault boolean,
failyield_driverfault boolean,
carereckless_driverfault boolean,
disregardsignal_driverfault boolean,
failyield_bikerfault boolean,
disregardsignal_bikerfault boolean,
allothercrashtype boolean,
bike_s_veh_rt_sw_ww_p boolean,
bike_s_veh_s_sw_ww_p boolean,
id integer NOT NULL DEFAULT nextval('crashes_bike2_id_seq'::regclass),
road_id1 integer,
road_id2 integer,
at_intersection boolean,
int_id integer,
CONSTRAINT crashes_bike2_pkey PRIMARY KEY (id)
);
CREATE INDEX idx_crashbike2aggbkflt
ON received.crashes_bike2
USING btree
(aggressive_bikerfault);
CREATE INDEX idx_crashbike2bsvhrtstsd
ON received.crashes_bike2
USING btree
(bike_s_veh_rt_st_sd);
CREATE INDEX idx_crashbike2bsvlso
ON received.crashes_bike2
USING btree
(bike_s_veh_lt_st_od);
CREATE INDEX idx_crashbike2bsvrsp
ON received.crashes_bike2
USING btree
(bike_s_veh_rt_st_p);
CREATE INDEX idx_crashbike2bsvrswp
ON received.crashes_bike2
USING btree
(bike_s_veh_rt_sw_ww_p);
CREATE INDEX idx_crashbike2bsvssp
ON received.crashes_bike2
USING btree
(bike_s_veh_s_st_p);
CREATE INDEX idx_crashbike2bsvsstsd
ON received.crashes_bike2
USING btree
(bike_s_veh_s_st_sd);
CREATE INDEX idx_crashbike2bsvsswwwp
ON received.crashes_bike2
USING btree
(bike_s_veh_s_sw_ww_p);
CREATE INDEX idx_crashbike2carreckdrv
ON received.crashes_bike2
USING btree
(carereckless_driverfault);
CREATE INDEX idx_crashbike2drvflt
ON received.crashes_bike2
USING btree
(aggressive_driverfault);
CREATE INDEX idx_crashbike2dsrgdsgndrv
ON received.crashes_bike2
USING btree
(disregardsignal_driverfault);
CREATE INDEX idx_crashbike2dsrgsgnbk
ON received.crashes_bike2
USING btree
(disregardsignal_bikerfault);
CREATE INDEX idx_crashbike2dstdrv
ON received.crashes_bike2
USING btree
(distracted_driverfault);
CREATE INDEX idx_crashbike2dui
ON received.crashes_bike2
USING btree
(influence);
CREATE INDEX idx_crashbike2flyldbik
ON received.crashes_bike2
USING btree
(failyield_bikerfault);
CREATE INDEX idx_crashbike2flylddrv
ON received.crashes_bike2
USING btree
(failyield_driverfault);
CREATE INDEX idx_crashbike2ftl
ON received.crashes_bike2
USING btree
(fatalcrash);
CREATE INDEX idx_crashbike2hispd
ON received.crashes_bike2
USING btree
(highspeed);
CREATE INDEX idx_crashbike2inj
ON received.crashes_bike2
USING btree
(injurycrash);
CREATE INDEX idx_crashbike2int
ON received.crashes_bike2
USING btree
(int_id);
Run Code Online (Sandbox Code Playgroud)
dez*_*zso 10
如果您在同一个事务中进行所有更新,则每个更新都必须处理越来越大的一组(物理)元组。请参阅以下示例:
CREATE TABLE explode (id integer, something text);
INSERT INTO explode SELECT i, md5(i::text) FROM generate_series(1, 100000) t(i);
\dt+ explode -- done in psql
List of relations
Schema ? Name ? Type ? Owner ? Size ? Description
???????????????????????????????????????????????????????????
test ? explode ? table ? avaczi ? 6704 kB ?
BEGIN;
UPDATE explode SET something = something || 'a';
\dt+ explode
test ? explode ? table ? avaczi ? 13 MB ?
UPDATE explode SET something = something || 'a';
\dt+ explode
test ? explode ? table ? avaczi ? 20 MB ?
COMMIT;
Run Code Online (Sandbox Code Playgroud)
即使不同的更新不在同一个事务中,您也可能会看到这一点。如果 autovacuum(或手动 VACUUM)跟不上变化的速度,就会出现这种情况。
原因是 PostgreSQL 中的 MVCC 是如何工作的。进行更新时,它会使用新值创建一个新的物理行,将旧的行标记为在当前行之后开始的事务不可见。这意味着一旦事务提交,物理层上就会有“浪费空间”的行版本。这些最终必须通过(自动)吸尘来释放。
现在,当您处于事务内部时,不清楚它是否会被提交或回滚,因此数据库必须保留所有过时的行版本,以便它可以恢复到原始行版本。(嗯,不是全部,似乎有一些优化,但肯定有一些:在我的实验中,上面看到的表增长到 200 MB(而不是 6.6)。)这意味着你的表的物理大小将增长并且增长 - 每个步骤的大小取决于您在UPDATE语句中的具体操作。
这是您可以大量改进流程的地方。目前,您有许多无条件更改整个表的更新,这实际上意味着在每次迭代中,您的表大小是前一次的两倍。在 46(或 23)轮之后,即使是相对较小的桌子也可以变得非常大。
因此,对保留的所有版本进行操作所需的时间越来越多。在我的运行中,第一次UPDATE在 190-200 毫秒内完成,第 20 次迭代达到了 400 毫秒。由于您有漂亮的宽表,并且单独运行时更新已经需要 30-40 秒,因此您可能会出现严重的减速。
由于您的更新看起来非常相似,并且它们只会影响同一表的不同列,因此您可以尝试类似的操作
UPDATE generated.crash_aggregates
SET bike_driver_aggressive = (
SELECT COUNT(*)
FROM crashes_bike2 c
WHERE c.int_id = crash_aggregates.int_id
AND c.aggressive_driverfault
),
bike_allinjury = (
SELECT COUNT(*)
FROM crashes_bike1 c
WHERE c.int_id = crash_aggregates.int_id
AND c.injurycrash
) + (
SELECT COUNT(*)
FROM crashes_bike2 c
WHERE c.int_id = crash_aggregates.int_id
AND c.injurycrash
),
[...];
Run Code Online (Sandbox Code Playgroud)
这意味着 CPU 将不得不考虑所有计数,但您为它们的工作付出了代价。同时,该表将仅被重写一次。
当所有这些都完成后,通过类似地编写一个UPDATE查询来仅在一轮中更新它们,似乎只计算一次排名似乎就足够了。
嗯,以上UPDATE(也在评论中建议)远非有史以来最有效的。这是进一步改进它的方法。
因此,看起来,计数是一一计算的,这意味着将多次访问两个源表。根据它们的大小(和其他一些因素),这可能很糟糕。在您的情况下,它保持在疼痛阈值以下,否则您也会对此抱怨;) 在其他情况下,情况可能并非如此。
这个想法是可以在一次运行中收集所有聚合,并将其用作更新的源。为此,我们可以构建一个大的聚合所有结构:
SELECT int_id,
sum(c1.injurycrash::integer) + sum(c2.injurycrash::integer),
sum(aggressive_driverfault::integer),
...
FROM crashes_bike1 AS c1, crashes_bike2 AS c2
GROUP BY int_id;
Run Code Online (Sandbox Code Playgroud)
在这里,我假设不同标志(如injurycrash)的分布使得TRUE值不是很罕见。如果是这样,对整个crashes_bike表(好吧,他们两个)的这一大扫描可能比许多索引(仅)扫描更糟糕。但是,我没有看到您将所有内容都编入索引(这可能没有意义 - 没有实际数据很难判断)。
我在上面的聚合中替换了count()for sum()。诀窍是将布尔值转换为整数并将它们相加,从而避免使用非常复杂的CASE表达式集。
一旦我们有了上面的结果集,我们就可以将它插入到UPDATE自身中:
WITH aggregates1 AS (
SELECT int_id,
sum(injurycrash::integer),
sum(aggressive_driverfault::integer) AS aggressive_driverfault,
...
FROM crashes_bike1
GROUP BY int_id
), aggregates2 AS (
SELECT sum(injurycrash::integer) AS injurycrash,
...
FROM crashes_bike2
GROUP BY int_id
)
UPDATE crash_aggregates AS ca
SET bike_allinjury = a1.injurycrash + a2.injurycrash,
bike_driver_aggressive = a.aggressive_driverfault,
...
FROM aggregates1 AS a1
FULL JOIN aggregates2 AS a2 USING (int_id)
WHERE ca.int_id = a1.int_id OR ca.int_id = a2.int_id;
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
349 次 |
| 最近记录: |