mou*_*r11 5 postgresql performance postgresql-9.3 postgresql-performance
我希望加快单个表上的一些计算。
这是表,我相信它有超过 9300 万行,并且每天都在增长:
CREATE TABLE daily_data
(
id serial NOT NULL,
company_id integer NOT NULL,
trade_date date NOT NULL,
daily_val numeric NOT NULL,
bbg_pulls_id integer,
gen_qtr_end_dt_id integer,
ern_release_date_id integer,
wh_calc_id integer,
CONSTRAINT daily_data_pkey PRIMARY KEY (id),
CONSTRAINT daily_data_bbg_pulls_id_fkey FOREIGN KEY (bbg_pulls_id)
REFERENCES bbg_pulls (id) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE NO ACTION,
CONSTRAINT daily_data_company_id_fkey FOREIGN KEY (company_id)
REFERENCES company (id) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE NO ACTION,
CONSTRAINT daily_data_ern_release_date_id_fkey FOREIGN KEY (ern_release_date_id)
REFERENCES ern_dt (id) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE NO ACTION,
CONSTRAINT daily_data_wh_calc_id_fkey FOREIGN KEY (wh_calc_id)
REFERENCES wh_calc (id) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE NO ACTION DEFERRABLE INITIALLY IMMEDIATE,
CONSTRAINT daily_data_company_id_trade_date_bbg_pulls_id_key UNIQUE (company_id, trade_date, bbg_pulls_id),
CONSTRAINT daily_data_company_id_trade_date_wh_calc_id_key UNIQUE (company_id, trade_date, wh_calc_id),
CONSTRAINT daily_data_check CHECK ((wh_calc_id IS NULL) <> (bbg_pulls_id IS NULL))
)
CREATE INDEX daily_data_bbg_pulls_id_idx
ON daily_data
USING btree
(bbg_pulls_id)
WHERE bbg_pulls_id IS NOT NULL;
CREATE INDEX daily_data_company_id_idx
ON daily_data
USING btree
(company_id);
CREATE INDEX daily_data_gen_qtr_end_dt_id_idx
ON daily_data
USING btree
(gen_qtr_end_dt_id)
WHERE gen_qtr_end_dt_id IS NOT NULL;
CREATE INDEX daily_data_trade_date_idx
ON daily_data
USING btree
(trade_date);
CREATE INDEX daily_data_wh_calc_id_idx
ON daily_data
USING btree
(wh_calc_id)
WHERE wh_calc_id IS NOT NULL;
Run Code Online (Sandbox Code Playgroud)
这是我实际/最终想做的事情:
with dd2 as (select * from daily_data where wh_calc_id = 241 -- <- the 241 value is passed into a function where this is used
)
INSERT INTO daily_data (
company_id
,trade_date
,daily_val
,wh_calc_id
)
SELECT d.company_id
,d.trade_date
, round(CASE WHEN x.ct = 0 THEN numeric '1'
ELSE x.ct_lt / x.ct END, 6) AS pctl_calc
,1 -- <-- dummy value, value is passed into the function where this query is used
FROM dd2 d, LATERAL (
SELECT count(daily_val) AS ct
, count(daily_val < d.daily_val OR NULL)::numeric As ct_lt
FROM dd2
WHERE company_id = d.company_id
-- and company_id < 8
AND trade_date < d.trade_date
and wh_calc_id = d.wh_calc_id
-- and wh_calc_id = 241
) x
Run Code Online (Sandbox Code Playgroud)
我正在通过这样做来测试它(基本上除了插入之外的所有内容),但它在我取消之前运行了 16 小时:
with dd2 as (select * from daily_data where wh_calc_id = 241
)
SELECT d.*
, round(CASE WHEN x.ct = 0 THEN numeric '1'
ELSE x.ct_lt / x.ct END, 6) AS pctl_calc
FROM dd2 d, LATERAL (
SELECT count(daily_val) AS ct
, count(daily_val < d.daily_val OR NULL)::numeric As ct_lt
FROM dd2
WHERE company_id = d.company_id
-- and company_id < 8
AND trade_date < d.trade_date
and wh_calc_id = d.wh_calc_id
-- and wh_calc_id = 241
) x
ORDER BY company_id, trade_date;
Run Code Online (Sandbox Code Playgroud)
所以我运行一个子集 ( AND company_id < 8
) 来得到一个解释分析,这需要不到 3 分钟:解释分析
with dd2 as (select * from daily_data where wh_calc_id = 241 AND company_id < 8
)
SELECT d.*
, round(CASE WHEN x.ct = 0 THEN numeric '1'
ELSE x.ct_lt / x.ct END, 6) AS pctl_calc
FROM dd2 d, LATERAL (
SELECT count(daily_val) AS ct
, count(daily_val < d.daily_val OR NULL)::numeric As ct_lt
FROM dd2
WHERE company_id = d.company_id
-- and company_id < 8
AND trade_date < d.trade_date
and wh_calc_id = d.wh_calc_id
-- and wh_calc_id = 241
) x
ORDER BY company_id, trade_date;
Run Code Online (Sandbox Code Playgroud)
这是解释分析输出:
"Sort (cost=8.56..8.57 rows=1 width=100) (actual time=219363.049..219367.217 rows=24444 loops=1)"
" Sort Key: d.company_id, d.trade_date"
" Sort Method: external merge Disk: 1264kB"
" CTE dd2"
" -> Index Scan using daily_data_wh_calc_id_idx on daily_data (cost=0.43..8.46 rows=1 width=34) (actual time=0.415..70805.295 rows=24444 loops=1)"
" Index Cond: (wh_calc_id = 241)"
" Filter: (company_id < 8)"
" Rows Removed by Filter: 8661143"
" -> Nested Loop (cost=0.04..0.10 rows=1 width=100) (actual time=70835.311..219272.273 rows=24444 loops=1)"
" -> CTE Scan on dd2 d (cost=0.00..0.02 rows=1 width=60) (actual time=0.423..64.374 rows=24444 loops=1)"
" -> Aggregate (cost=0.04..0.05 rows=1 width=32) (actual time=8.965..8.965 rows=1 loops=24444)"
" -> CTE Scan on dd2 (cost=0.00..0.03 rows=1 width=32) (actual time=5.201..8.203 rows=2422 loops=24444)"
" Filter: ((trade_date < d.trade_date) AND (company_id = d.company_id) AND (wh_calc_id = d.wh_calc_id))"
" Rows Removed by Filter: 22022"
"Total runtime: 219374.219 ms"
Run Code Online (Sandbox Code Playgroud)
注意我使用 CTE 并让事情运行得更快。我遇到内存错误和/或即使使用子集测试查询,事情也会永远运行。最终所有这些都将发生在一个函数中,因此这里硬编码的值实际上会传递到函数中,但我认为这与问题无关。
有关该特定查询/计算的更多信息,请访问: PostgreSQL 9.3.5 中采用排序参数(或相同功能)的百分比排名
我去这里尝试自己解决但无济于事(为我的鞋子提供):https : //explain.depesz.com/
我没有看到可以在哪里添加索引以使其更快。
问题 1:我该怎么做才能加快此查询的速度?我只是专注于插入的选择部分,我知道,但是如果您对如何加速插入(除了删除索引)有想法,我会很高兴听到它们。
问题 2:我可以从服务器资源的角度做些什么来使计算/查询运行得更快?
SELECT version() “PostgreSQL 9.3.5 on x86_64-suse-linux-gnu,由 gcc (SUSE Linux) 4.8.3 20140627 编译 [gcc-4_8-branch 修订版 212064],64 位”
内存 2048 MB,CPU:2 vCPU,预配存储:202.11 GB
我知道我可以删除一些查询中未使用的索引来加快插入速度。如果需要,我稍后会这样做。现在我只想让选择部分运行得更快。其他索引用于其他重要查询。
为了测试我创建了小样本表:
INSERT INTO daily_data (
company_id
,trade_date
,daily_val
,wh_calc_id
)
SELECT
(random() * 10)::int,
'2010-01-01'::timestamp + (i::text || ' day')::interval,
random() * 100,
(random() * 20) + 230
FROM
generate_series(1, 200000) AS i
;
Run Code Online (Sandbox Code Playgroud)
EXPLAIN ANALYZE
with dd2 as (select * from daily_data where wh_calc_id = 241 AND company_id < 8
)
SELECT d.*
, ct, ct_lt
, round(CASE WHEN x.ct = 0 THEN numeric '1'
ELSE x.ct_lt / x.ct END, 6) AS pctl_calc
FROM dd2 d, LATERAL (
SELECT count(daily_val) AS ct
, count(daily_val < d.daily_val OR NULL)::numeric As ct_lt
FROM dd2
WHERE company_id = d.company_id
-- and company_id < 8
AND trade_date < d.trade_date
and wh_calc_id = d.wh_calc_id
-- and wh_calc_id = 241
) x
ORDER BY company_id, trade_date;
Run Code Online (Sandbox Code Playgroud)
with dd2 as (
select company_id, trade_date
, (SELECT count(daily_val) FROM daily_data WHERE company_id = d.company_id AND trade_date < d.trade_date and wh_calc_id = d.wh_calc_id) AS ct
, (SELECT count(daily_val < d.daily_val OR NULL)::numeric FROM daily_data WHERE company_id = d.company_id AND trade_date < d.trade_date and wh_calc_id = d.wh_calc_id) AS ct_lt
from daily_data AS d where wh_calc_id = 241 AND company_id < 8
)
SELECT d.*
, round(CASE WHEN d.ct = 0 THEN numeric '1'
ELSE d.ct_lt / d.ct END, 6) AS pctl_calc
FROM dd2 d
ORDER BY company_id, trade_date;
Run Code Online (Sandbox Code Playgroud)
EXPLAIN ANALYZE
with dd2 as (select *,
count(daily_val) OVER (PARTITION BY company_id, wh_calc_id ORDER BY trade_date) - 1 AS ct,
(SELECT count(daily_val < d.daily_val OR NULL)::numeric FROM daily_data WHERE company_id = d.company_id AND trade_date < d.trade_date and wh_calc_id = d.wh_calc_id) AS ct_lt
from daily_data AS d where wh_calc_id = 241 AND company_id < 8
)
SELECT d.*
, round(CASE WHEN ct = 0 THEN numeric '1'
ELSE ct_lt / ct END, 6) AS pctl_calc
FROM dd2 d
ORDER BY company_id, trade_date;
Run Code Online (Sandbox Code Playgroud)
EXPLAIN ANALYZE
with dd2 as (select d.company_id, d.trade_date, d.daily_val,
count(daily_val) OVER w - 1 AS ct,
array_agg(daily_val) OVER w AS ct_lt2
from daily_data AS d
where wh_calc_id = 241 AND company_id < 8
WINDOW w AS (PARTITION BY company_id, wh_calc_id ORDER BY trade_date)
) , dd1 AS (
SELECT d.*
, (SELECT count(*) FILTER (WHERE u < d.daily_val) FROM unnest(ct_lt2) AS u) AS ct_lt
FROM dd2 d
) SELECT *
, round(CASE WHEN ct = 0 THEN numeric '1'
ELSE ct_lt / ct END, 6) AS pctl_calc
FROM dd1 AS d
ORDER BY company_id, trade_date;
Run Code Online (Sandbox Code Playgroud)
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------
Sort (cost=2653.28..2653.84 rows=226 width=132) (actual time=8399.345..8399.886 rows=7540 loops=1)
Sort Key: d.company_id, d.trade_date
Sort Method: quicksort Memory: 1253kB
CTE dd2
-> Bitmap Heap Scan on daily_data (cost=21.55..1224.03 rows=226 width=60) (actual time=1.026..4.794 rows=7540 loops=1)
Recheck Cond: (wh_calc_id = 241)
Filter: (company_id < 8)
Rows Removed by Filter: 2526
Heap Blocks: exact=1470
-> Bitmap Index Scan on daily_data_wh_calc_id_idx (cost=0.00..21.50 rows=677 width=0) (actual time=0.848..0.848 rows=10066 loops=1)
Index Cond: (wh_calc_id = 241)
-> Nested Loop (cost=6.22..1420.41 rows=226 width=132) (actual time=6.196..8391.961 rows=7540 loops=1)
-> CTE Scan on dd2 d (cost=0.00..4.52 rows=226 width=60) (actual time=1.029..1.933 rows=7540 loops=1)
-> Aggregate (cost=6.22..6.24 rows=1 width=40) (actual time=1.111..1.111 rows=1 loops=7540)
-> CTE Scan on dd2 (cost=0.00..6.22 rows=1 width=32) (actual time=0.003..0.995 rows=484 loops=7540)
Filter: ((trade_date < d.trade_date) AND (company_id = d.company_id) AND (wh_calc_id = d.wh_calc_id))
Rows Removed by Filter: 7056
Planning time: 0.345 ms
Execution time: 8400.233 ms
(19 rows)
Run Code Online (Sandbox Code Playgroud)
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Sort (cost=9949.69..9950.25 rows=226 width=80) (actual time=10558.505..10559.017 rows=7540 loops=1)
Sort Key: d.company_id, d.trade_date
Sort Method: quicksort Memory: 782kB
CTE dd2
-> Bitmap Heap Scan on daily_data d_1 (cost=21.55..9934.07 rows=226 width=48) (actual time=1.138..10538.523 rows=7540 loops=1)
Recheck Cond: (wh_calc_id = 241)
Filter: (company_id < 8)
Rows Removed by Filter: 2526
Heap Blocks: exact=1470
-> Bitmap Index Scan on daily_data_wh_calc_id_idx (cost=0.00..21.50 rows=677 width=0) (actual time=0.945..0.945 rows=10066 loops=1)
Index Cond: (wh_calc_id = 241)
SubPlan 1
-> Aggregate (cost=19.26..19.27 rows=1 width=8) (actual time=0.722..0.722 rows=1 loops=7540)
-> Index Scan using daily_data_company_id_trade_date_wh_calc_id_key on daily_data (cost=0.42..19.26 rows=1 width=32) (actual time=0.007..0.671 rows=484 loops=7540)
Index Cond: ((company_id = d_1.company_id) AND (trade_date < d_1.trade_date) AND (wh_calc_id = d_1.wh_calc_id))
SubPlan 2
-> Aggregate (cost=19.26..19.27 rows=1 width=32) (actual time=0.672..0.672 rows=1 loops=7540)
-> Index Scan using daily_data_company_id_trade_date_wh_calc_id_key on daily_data daily_data_1 (cost=0.42..19.26 rows=1 width=32) (actual time=0.006..0.551 rows=484 loops=7540)
Index Cond: ((company_id = d_1.company_id) AND (trade_date < d_1.trade_date) AND (wh_calc_id = d_1.wh_calc_id))
-> CTE Scan on dd2 d (cost=0.00..6.78 rows=226 width=80) (actual time=1.142..10552.746 rows=7540 loops=1)
Planning time: 0.248 ms
Execution time: 10559.387 ms
(22 rows)
Run Code Online (Sandbox Code Playgroud)
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Sort (cost=5609.72..5610.28 rows=226 width=132) (actual time=4931.963..4932.433 rows=7540 loops=1)
Sort Key: d.company_id, d.trade_date
Sort Method: quicksort Memory: 1253kB
CTE dd2
-> WindowAgg (cost=1232.87..5594.10 rows=226 width=100) (actual time=9.498..4914.892 rows=7540 loops=1)
-> Sort (cost=1232.87..1233.43 rows=226 width=60) (actual time=9.468..11.291 rows=7540 loops=1)
Sort Key: d_1.company_id, d_1.trade_date
Sort Method: quicksort Memory: 782kB
-> Bitmap Heap Scan on daily_data d_1 (cost=21.55..1224.03 rows=226 width=60) (actual time=1.002..5.251 rows=7540 loops=1)
Recheck Cond: (wh_calc_id = 241)
Filter: (company_id < 8)
Rows Removed by Filter: 2526
Heap Blocks: exact=1470
-> Bitmap Index Scan on daily_data_wh_calc_id_idx (cost=0.00..21.50 rows=677 width=0) (actual time=0.814..0.814 rows=10066 loops=1)
Index Cond: (wh_calc_id = 241)
SubPlan 1
-> Aggregate (cost=19.26..19.27 rows=1 width=32) (actual time=0.649..0.649 rows=1 loops=7540)
-> Index Scan using daily_data_company_id_trade_date_wh_calc_id_key on daily_data (cost=0.42..19.26 rows=1 width=32) (actual time=0.005..0.529 rows=484 loops=7540)
Index Cond: ((company_id = d_1.company_id) AND (trade_date < d_1.trade_date) AND (wh_calc_id = d_1.wh_calc_id))
-> CTE Scan on dd2 d (cost=0.00..6.78 rows=226 width=132) (actual time=9.504..4928.264 rows=7540 loops=1)
Planning time: 0.223 ms
Execution time: 4932.854 ms
(22 rows)
Run Code Online (Sandbox Code Playgroud)
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------
Sort (cost=1601.04..1601.61 rows=226 width=120) (actual time=2006.537..2018.331 rows=7540 loops=1)
Sort Key: d.company_id, d.trade_date
Sort Method: quicksort Memory: 63526kB
CTE dd2
-> WindowAgg (cost=1232.87..1239.08 rows=226 width=84) (actual time=8.298..86.164 rows=7540 loops=1)
-> Sort (cost=1232.87..1233.43 rows=226 width=44) (actual time=8.285..9.752 rows=7540 loops=1)
Sort Key: d_1.company_id, d_1.trade_date
Sort Method: quicksort Memory: 782kB
-> Bitmap Heap Scan on daily_data d_1 (cost=21.55..1224.03 rows=226 width=44) (actual time=0.935..5.078 rows=7540 loops=1)
Recheck Cond: (wh_calc_id = 241)
Filter: (company_id < 8)
Rows Removed by Filter: 2526
Heap Blocks: exact=1470
-> Bitmap Index Scan on daily_data_wh_calc_id_idx (cost=0.00..21.50 rows=677 width=0) (actual time=0.722..0.722 rows=10066 loops=1)
Index Cond: (wh_calc_id = 241)
CTE dd1
-> CTE Scan on dd2 d_2 (cost=0.00..346.34 rows=226 width=88) (actual time=8.315..1573.920 rows=7540 loops=1)
SubPlan 2
-> Aggregate (cost=1.50..1.51 rows=1 width=8) (actual time=0.155..0.156 rows=1 loops=7540)
-> Function Scan on unnest u (cost=0.00..1.00 rows=100 width=32) (actual time=0.034..0.058 rows=485 loops=7540)
-> CTE Scan on dd1 d (cost=0.00..6.78 rows=226 width=120) (actual time=8.321..1774.450 rows=7540 loops=1)
Planning time: 0.177 ms
Execution time: 2443.855 ms
(23 rows)
Run Code Online (Sandbox Code Playgroud)
Windows 版本是最好的。我发现没有简单的方法来转换ct_lt
它的计算。作为解决方法,我使用了数组中值的聚合并ct_lt
从中进行计数。也许,它会对大量数据造成一些惩罚。但即便如此,这个版本也会比其他版本更好。
适配9.4:
EXPLAIN ANALYZE
with dd2 as (select d.company_id, d.trade_date, d.daily_val,
count(daily_val) OVER w - 1 AS ct,
array_agg(daily_val) OVER w AS ct_lt2
from daily_data AS d
where wh_calc_id = 241 AND company_id < 8
WINDOW w AS (PARTITION BY company_id, wh_calc_id ORDER BY trade_date)
) , dd1 AS (
SELECT d.*
, (SELECT count(*) FROM unnest(ct_lt2) AS u WHERE u < d.daily_val) AS ct_lt
FROM dd2 d
) SELECT *
, round(CASE WHEN ct = 0 THEN numeric '1'
ELSE ct_lt::numeric / ct END, 6) AS pctl_calc
FROM dd1 AS d
ORDER BY company_id, trade_date;
Run Code Online (Sandbox Code Playgroud)
指数:
CREATE INDEX daily_data_i1_idx
ON daily_data
(wh_calc_id, company_id, trade_date);
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------
Sort (cost=1311.98..1312.82 rows=333 width=120) (actual time=1597.558..1613.481 rows=7407 loops=1)
Sort Key: d.company_id, d.trade_date
Sort Method: quicksort Memory: 61659kB
CTE dd2
-> WindowAgg (cost=824.34..833.50 rows=333 width=84) (actual time=10.416..87.668 rows=7407 loops=1)
-> Sort (cost=824.34..825.17 rows=333 width=44) (actual time=10.381..11.851 rows=7407 loops=1)
Sort Key: d_1.company_id, d_1.trade_date
Sort Method: quicksort Memory: 771kB
-> Bitmap Heap Scan on daily_data d_1 (cost=11.83..810.39 rows=333 width=44) (actual time=2.934..6.548 rows=7407 loops=1)
Recheck Cond: ((wh_calc_id = 241) AND (company_id < 8))
Heap Blocks: exact=1460
-> Bitmap Index Scan on daily_data_i1_idx (cost=0.00..11.75 rows=333 width=0) (actual time=2.661..2.661 rows=7407 loops=1)
Index Cond: ((wh_calc_id = 241) AND (company_id < 8))
CTE dd1
-> CTE Scan on dd2 d_2 (cost=0.00..454.55 rows=333 width=88) (actual time=10.442..1320.450 rows=7407 loops=1)
SubPlan 2
-> Aggregate (cost=1.33..1.34 rows=1 width=8) (actual time=0.143..0.143 rows=1 loops=7407)
-> Function Scan on unnest u (cost=0.00..1.25 rows=33 width=0) (actual time=0.036..0.124 rows=234 loops=7407)
Filter: (u < d_2.daily_val)
Rows Removed by Filter: 243
-> CTE Scan on dd1 d (cost=0.00..9.99 rows=333 width=120) (actual time=10.450..1468.533 rows=7407 loops=1)
Planning time: 0.471 ms
Execution time: 1960.835 ms
Run Code Online (Sandbox Code Playgroud)
归档时间: |
|
查看次数: |
202 次 |
最近记录: |