加快查询计算。我在哪里可以添加索引或优化查询或服务器?

mou*_*r11 5 postgresql performance postgresql-9.3 postgresql-performance

我希望加快单个表上的一些计算。

这是表,我相信它有超过 9300 万行,并且每天都在增长:

CREATE TABLE daily_data
(
  id serial NOT NULL,
  company_id integer NOT NULL,
  trade_date date NOT NULL,
  daily_val numeric NOT NULL,
  bbg_pulls_id integer,
  gen_qtr_end_dt_id integer,
  ern_release_date_id integer,
  wh_calc_id integer,
  CONSTRAINT daily_data_pkey PRIMARY KEY (id),
  CONSTRAINT daily_data_bbg_pulls_id_fkey FOREIGN KEY (bbg_pulls_id)
      REFERENCES bbg_pulls (id) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE NO ACTION,
  CONSTRAINT daily_data_company_id_fkey FOREIGN KEY (company_id)
      REFERENCES company (id) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE NO ACTION,
  CONSTRAINT daily_data_ern_release_date_id_fkey FOREIGN KEY (ern_release_date_id)
      REFERENCES ern_dt (id) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE NO ACTION,
  CONSTRAINT daily_data_wh_calc_id_fkey FOREIGN KEY (wh_calc_id)
      REFERENCES wh_calc (id) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE NO ACTION DEFERRABLE INITIALLY IMMEDIATE,
  CONSTRAINT daily_data_company_id_trade_date_bbg_pulls_id_key UNIQUE (company_id, trade_date, bbg_pulls_id),
  CONSTRAINT daily_data_company_id_trade_date_wh_calc_id_key UNIQUE (company_id, trade_date, wh_calc_id),
  CONSTRAINT daily_data_check CHECK ((wh_calc_id IS NULL) <> (bbg_pulls_id IS NULL))
)

CREATE INDEX daily_data_bbg_pulls_id_idx
  ON daily_data
  USING btree
  (bbg_pulls_id)
  WHERE bbg_pulls_id IS NOT NULL;

CREATE INDEX daily_data_company_id_idx
  ON daily_data
  USING btree
  (company_id);

CREATE INDEX daily_data_gen_qtr_end_dt_id_idx
  ON daily_data
  USING btree
  (gen_qtr_end_dt_id)
  WHERE gen_qtr_end_dt_id IS NOT NULL;

CREATE INDEX daily_data_trade_date_idx
  ON daily_data
  USING btree
  (trade_date);

CREATE INDEX daily_data_wh_calc_id_idx
  ON daily_data
  USING btree
  (wh_calc_id)
  WHERE wh_calc_id IS NOT NULL;
Run Code Online (Sandbox Code Playgroud)

这是我实际/最终想做的事情:

with dd2 as (select * from daily_data where wh_calc_id = 241 -- <- the 241 value is passed into a function where this is used
) 
INSERT INTO daily_data (
        company_id
        ,trade_date
        ,daily_val
        ,wh_calc_id
        )
SELECT  d.company_id
    ,d.trade_date
    , round(CASE WHEN x.ct = 0 THEN numeric '1'  
                  ELSE x.ct_lt / x.ct END, 6) AS pctl_calc
        ,1 -- <-- dummy value, value is passed into the function where this query is used
FROM   dd2 d, LATERAL (
   SELECT count(daily_val) AS ct
        , count(daily_val < d.daily_val OR NULL)::numeric As ct_lt
   FROM   dd2
   WHERE  company_id = d.company_id
 --   and     company_id < 8
   AND    trade_date < d.trade_date
   and    wh_calc_id = d.wh_calc_id
   -- and     wh_calc_id = 241
   ) x
Run Code Online (Sandbox Code Playgroud)

我正在通过这样做来测试它(基本上除了插入之外的所有内容),但它在我取消之前运行了 16 小时:

with dd2 as (select * from daily_data where wh_calc_id = 241 
) 
SELECT d.*
     , round(CASE WHEN x.ct = 0 THEN numeric '1'  
                  ELSE x.ct_lt / x.ct END, 6) AS pctl_calc
FROM   dd2 d, LATERAL (
   SELECT count(daily_val) AS ct
        , count(daily_val < d.daily_val OR NULL)::numeric As ct_lt
   FROM   dd2
   WHERE  company_id = d.company_id
 --   and     company_id < 8
   AND    trade_date < d.trade_date
   and    wh_calc_id = d.wh_calc_id
   -- and     wh_calc_id = 241

   ) x
ORDER  BY company_id, trade_date;
Run Code Online (Sandbox Code Playgroud)

所以我运行一个子集 ( AND company_id < 8) 来得到一个解释分析,这需要不到 3 分钟:解释分析

with dd2 as (select * from daily_data where wh_calc_id = 241 AND company_id < 8 
) 
SELECT d.*
     , round(CASE WHEN x.ct = 0 THEN numeric '1'  
                  ELSE x.ct_lt / x.ct END, 6) AS pctl_calc
FROM   dd2 d, LATERAL (
   SELECT count(daily_val) AS ct
        , count(daily_val < d.daily_val OR NULL)::numeric As ct_lt
   FROM   dd2
   WHERE  company_id = d.company_id
   --   and       company_id < 8
   AND    trade_date < d.trade_date
   and    wh_calc_id = d.wh_calc_id
   -- and     wh_calc_id = 241

   ) x
ORDER  BY company_id, trade_date;
Run Code Online (Sandbox Code Playgroud)

这是解释分析输出:

"Sort  (cost=8.56..8.57 rows=1 width=100) (actual time=219363.049..219367.217 rows=24444 loops=1)"
"  Sort Key: d.company_id, d.trade_date"
"  Sort Method: external merge  Disk: 1264kB"
"  CTE dd2"
"    ->  Index Scan using daily_data_wh_calc_id_idx on daily_data  (cost=0.43..8.46 rows=1 width=34) (actual time=0.415..70805.295 rows=24444 loops=1)"
"          Index Cond: (wh_calc_id = 241)"
"          Filter: (company_id < 8)"
"          Rows Removed by Filter: 8661143"
"  ->  Nested Loop  (cost=0.04..0.10 rows=1 width=100) (actual time=70835.311..219272.273 rows=24444 loops=1)"
"        ->  CTE Scan on dd2 d  (cost=0.00..0.02 rows=1 width=60) (actual time=0.423..64.374 rows=24444 loops=1)"
"        ->  Aggregate  (cost=0.04..0.05 rows=1 width=32) (actual time=8.965..8.965 rows=1 loops=24444)"
"              ->  CTE Scan on dd2  (cost=0.00..0.03 rows=1 width=32) (actual time=5.201..8.203 rows=2422 loops=24444)"
"                    Filter: ((trade_date < d.trade_date) AND (company_id = d.company_id) AND (wh_calc_id = d.wh_calc_id))"
"                    Rows Removed by Filter: 22022"
"Total runtime: 219374.219 ms"
Run Code Online (Sandbox Code Playgroud)

注意我使用 CTE 并让事情运行得更快。我遇到内存错误和/或即使使用子集测试查询,事情也会永远运行。最终所有这些都将发生在一个函数中,因此这里硬编码的值实际上会传递到函数中,但我认为这与问题无关。

有关该特定查询/计算的更多信息,请访问: PostgreSQL 9.3.5 中采用排序参数(或相同功能)的百分比排名

我去这里尝试自己解决但无济于事(为我的鞋子提供):https : //explain.depesz.com/

我没有看到可以在哪里添加索引以使其更快。

问题 1:我该怎么做才能加快此查询的速度?我只是专注于插入的选择部分,我知道,但是如果您对如何加速插入(除了删除索引)有想法,我会很高兴听到它们。

问题 2:我可以从服务器资源的角度做些什么来使计算/查询运行得更快?

SELECT version() “PostgreSQL 9.3.5 on x86_64-suse-linux-gnu,由 gcc (SUSE Linux) 4.8.3 20140627 编译 [gcc-4_8-branch 修订版 212064],64 位”

内存 2048 MB,CPU:2 vCPU,预配存储:202.11 GB

我知道我可以删除一些查询中未使用的索引来加快插入速度。如果需要,我稍后会这样做。现在我只想让选择部分运行得更快。其他索引用于其他重要查询。

Rom*_*huk 2

为了测试我创建了小样本表:

INSERT INTO daily_data (
        company_id
        ,trade_date
        ,daily_val
        ,wh_calc_id
        )
SELECT
        (random() * 10)::int,
        '2010-01-01'::timestamp + (i::text || ' day')::interval,
        random() * 100,
        (random() * 20) + 230
FROM
        generate_series(1, 200000) AS i
;
Run Code Online (Sandbox Code Playgroud)

原始查询:

EXPLAIN ANALYZE
with dd2 as (select * from daily_data where wh_calc_id = 241 AND company_id < 8 
) 
SELECT d.*
, ct, ct_lt
     , round(CASE WHEN x.ct = 0 THEN numeric '1'  
                  ELSE x.ct_lt / x.ct END, 6) AS pctl_calc
FROM   dd2 d, LATERAL (
   SELECT count(daily_val) AS ct
        , count(daily_val < d.daily_val OR NULL)::numeric As ct_lt
   FROM   dd2
   WHERE  company_id = d.company_id
   --   and       company_id < 8
   AND    trade_date < d.trade_date
   and    wh_calc_id = d.wh_calc_id
   -- and     wh_calc_id = 241

   ) x
ORDER  BY company_id, trade_date;
Run Code Online (Sandbox Code Playgroud)

子查询版本:

with dd2 as (
        select company_id, trade_date
   , (SELECT count(daily_val) FROM daily_data WHERE company_id = d.company_id AND trade_date < d.trade_date and wh_calc_id = d.wh_calc_id) AS ct
   , (SELECT count(daily_val < d.daily_val OR NULL)::numeric FROM daily_data WHERE company_id = d.company_id AND trade_date < d.trade_date and wh_calc_id = d.wh_calc_id) AS ct_lt
        from daily_data AS d where wh_calc_id = 241 AND company_id < 8 
)
SELECT d.*
     , round(CASE WHEN d.ct = 0 THEN numeric '1'  
                  ELSE d.ct_lt / d.ct END, 6) AS pctl_calc
FROM   dd2 d
ORDER  BY company_id, trade_date;
Run Code Online (Sandbox Code Playgroud)

部分窗口+部分子查询版本:#

EXPLAIN ANALYZE
with dd2 as (select *,
        count(daily_val) OVER (PARTITION BY company_id, wh_calc_id ORDER BY trade_date) - 1 AS ct,
        (SELECT count(daily_val < d.daily_val OR NULL)::numeric FROM daily_data WHERE company_id = d.company_id AND trade_date < d.trade_date and wh_calc_id = d.wh_calc_id) AS ct_lt
        from daily_data AS d where wh_calc_id = 241 AND company_id < 8 
) 
SELECT d.*
     , round(CASE WHEN ct = 0 THEN numeric '1'  
                  ELSE ct_lt / ct END, 6) AS pctl_calc
FROM   dd2 d
ORDER  BY company_id, trade_date;
Run Code Online (Sandbox Code Playgroud)

纯窗口版本(但有些棘手):

EXPLAIN ANALYZE
with dd2 as (select d.company_id, d.trade_date, d.daily_val,
        count(daily_val) OVER w - 1 AS ct,
        array_agg(daily_val) OVER w AS ct_lt2
        from daily_data AS d
        where wh_calc_id = 241 AND company_id < 8 
        WINDOW w AS (PARTITION BY company_id, wh_calc_id ORDER BY trade_date)
) , dd1 AS (  
SELECT d.*
, (SELECT count(*) FILTER (WHERE u < d.daily_val) FROM unnest(ct_lt2) AS u) AS ct_lt

FROM   dd2 d
) SELECT *
     , round(CASE WHEN ct = 0 THEN numeric '1'  
                  ELSE ct_lt / ct END, 6) AS pctl_calc
FROM dd1 AS d
ORDER  BY company_id, trade_date;
Run Code Online (Sandbox Code Playgroud)

原始查询:

                                                                   QUERY PLAN                                                                    
-------------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=2653.28..2653.84 rows=226 width=132) (actual time=8399.345..8399.886 rows=7540 loops=1)
   Sort Key: d.company_id, d.trade_date
   Sort Method: quicksort  Memory: 1253kB
   CTE dd2
     ->  Bitmap Heap Scan on daily_data  (cost=21.55..1224.03 rows=226 width=60) (actual time=1.026..4.794 rows=7540 loops=1)
           Recheck Cond: (wh_calc_id = 241)
           Filter: (company_id < 8)
           Rows Removed by Filter: 2526
           Heap Blocks: exact=1470
           ->  Bitmap Index Scan on daily_data_wh_calc_id_idx  (cost=0.00..21.50 rows=677 width=0) (actual time=0.848..0.848 rows=10066 loops=1)
                 Index Cond: (wh_calc_id = 241)
   ->  Nested Loop  (cost=6.22..1420.41 rows=226 width=132) (actual time=6.196..8391.961 rows=7540 loops=1)
         ->  CTE Scan on dd2 d  (cost=0.00..4.52 rows=226 width=60) (actual time=1.029..1.933 rows=7540 loops=1)
         ->  Aggregate  (cost=6.22..6.24 rows=1 width=40) (actual time=1.111..1.111 rows=1 loops=7540)
               ->  CTE Scan on dd2  (cost=0.00..6.22 rows=1 width=32) (actual time=0.003..0.995 rows=484 loops=7540)
                     Filter: ((trade_date < d.trade_date) AND (company_id = d.company_id) AND (wh_calc_id = d.wh_calc_id))
                     Rows Removed by Filter: 7056
 Planning time: 0.345 ms
 Execution time: 8400.233 ms
(19 rows)
Run Code Online (Sandbox Code Playgroud)

子查询版本:

                                                                                              QUERY PLAN                                                                                              
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=9949.69..9950.25 rows=226 width=80) (actual time=10558.505..10559.017 rows=7540 loops=1)
   Sort Key: d.company_id, d.trade_date
   Sort Method: quicksort  Memory: 782kB
   CTE dd2
     ->  Bitmap Heap Scan on daily_data d_1  (cost=21.55..9934.07 rows=226 width=48) (actual time=1.138..10538.523 rows=7540 loops=1)
           Recheck Cond: (wh_calc_id = 241)
           Filter: (company_id < 8)
           Rows Removed by Filter: 2526
           Heap Blocks: exact=1470
           ->  Bitmap Index Scan on daily_data_wh_calc_id_idx  (cost=0.00..21.50 rows=677 width=0) (actual time=0.945..0.945 rows=10066 loops=1)
                 Index Cond: (wh_calc_id = 241)
           SubPlan 1
             ->  Aggregate  (cost=19.26..19.27 rows=1 width=8) (actual time=0.722..0.722 rows=1 loops=7540)
                   ->  Index Scan using daily_data_company_id_trade_date_wh_calc_id_key on daily_data  (cost=0.42..19.26 rows=1 width=32) (actual time=0.007..0.671 rows=484 loops=7540)
                         Index Cond: ((company_id = d_1.company_id) AND (trade_date < d_1.trade_date) AND (wh_calc_id = d_1.wh_calc_id))
           SubPlan 2
             ->  Aggregate  (cost=19.26..19.27 rows=1 width=32) (actual time=0.672..0.672 rows=1 loops=7540)
                   ->  Index Scan using daily_data_company_id_trade_date_wh_calc_id_key on daily_data daily_data_1  (cost=0.42..19.26 rows=1 width=32) (actual time=0.006..0.551 rows=484 loops=7540)
                         Index Cond: ((company_id = d_1.company_id) AND (trade_date < d_1.trade_date) AND (wh_calc_id = d_1.wh_calc_id))
   ->  CTE Scan on dd2 d  (cost=0.00..6.78 rows=226 width=80) (actual time=1.142..10552.746 rows=7540 loops=1)
 Planning time: 0.248 ms
 Execution time: 10559.387 ms
(22 rows)
Run Code Online (Sandbox Code Playgroud)

部分窗口+部分子查询版本:

                                                                                       QUERY PLAN                                                                                        
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=5609.72..5610.28 rows=226 width=132) (actual time=4931.963..4932.433 rows=7540 loops=1)
   Sort Key: d.company_id, d.trade_date
   Sort Method: quicksort  Memory: 1253kB
   CTE dd2
     ->  WindowAgg  (cost=1232.87..5594.10 rows=226 width=100) (actual time=9.498..4914.892 rows=7540 loops=1)
           ->  Sort  (cost=1232.87..1233.43 rows=226 width=60) (actual time=9.468..11.291 rows=7540 loops=1)
                 Sort Key: d_1.company_id, d_1.trade_date
                 Sort Method: quicksort  Memory: 782kB
                 ->  Bitmap Heap Scan on daily_data d_1  (cost=21.55..1224.03 rows=226 width=60) (actual time=1.002..5.251 rows=7540 loops=1)
                       Recheck Cond: (wh_calc_id = 241)
                       Filter: (company_id < 8)
                       Rows Removed by Filter: 2526
                       Heap Blocks: exact=1470
                       ->  Bitmap Index Scan on daily_data_wh_calc_id_idx  (cost=0.00..21.50 rows=677 width=0) (actual time=0.814..0.814 rows=10066 loops=1)
                             Index Cond: (wh_calc_id = 241)
           SubPlan 1
             ->  Aggregate  (cost=19.26..19.27 rows=1 width=32) (actual time=0.649..0.649 rows=1 loops=7540)
                   ->  Index Scan using daily_data_company_id_trade_date_wh_calc_id_key on daily_data  (cost=0.42..19.26 rows=1 width=32) (actual time=0.005..0.529 rows=484 loops=7540)
                         Index Cond: ((company_id = d_1.company_id) AND (trade_date < d_1.trade_date) AND (wh_calc_id = d_1.wh_calc_id))
   ->  CTE Scan on dd2 d  (cost=0.00..6.78 rows=226 width=132) (actual time=9.504..4928.264 rows=7540 loops=1)
 Planning time: 0.223 ms
 Execution time: 4932.854 ms
(22 rows)
Run Code Online (Sandbox Code Playgroud)

纯窗口版本(但有些棘手):

                                                                         QUERY PLAN                                                                          
-------------------------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=1601.04..1601.61 rows=226 width=120) (actual time=2006.537..2018.331 rows=7540 loops=1)
   Sort Key: d.company_id, d.trade_date
   Sort Method: quicksort  Memory: 63526kB
   CTE dd2
     ->  WindowAgg  (cost=1232.87..1239.08 rows=226 width=84) (actual time=8.298..86.164 rows=7540 loops=1)
           ->  Sort  (cost=1232.87..1233.43 rows=226 width=44) (actual time=8.285..9.752 rows=7540 loops=1)
                 Sort Key: d_1.company_id, d_1.trade_date
                 Sort Method: quicksort  Memory: 782kB
                 ->  Bitmap Heap Scan on daily_data d_1  (cost=21.55..1224.03 rows=226 width=44) (actual time=0.935..5.078 rows=7540 loops=1)
                       Recheck Cond: (wh_calc_id = 241)
                       Filter: (company_id < 8)
                       Rows Removed by Filter: 2526
                       Heap Blocks: exact=1470
                       ->  Bitmap Index Scan on daily_data_wh_calc_id_idx  (cost=0.00..21.50 rows=677 width=0) (actual time=0.722..0.722 rows=10066 loops=1)
                             Index Cond: (wh_calc_id = 241)
   CTE dd1
     ->  CTE Scan on dd2 d_2  (cost=0.00..346.34 rows=226 width=88) (actual time=8.315..1573.920 rows=7540 loops=1)
           SubPlan 2
             ->  Aggregate  (cost=1.50..1.51 rows=1 width=8) (actual time=0.155..0.156 rows=1 loops=7540)
                   ->  Function Scan on unnest u  (cost=0.00..1.00 rows=100 width=32) (actual time=0.034..0.058 rows=485 loops=7540)
   ->  CTE Scan on dd1 d  (cost=0.00..6.78 rows=226 width=120) (actual time=8.321..1774.450 rows=7540 loops=1)
 Planning time: 0.177 ms
 Execution time: 2443.855 ms
(23 rows)
Run Code Online (Sandbox Code Playgroud)

结论

Windows 版本是最好的。我发现没有简单的方法来转换ct_lt它的计算。作为解决方法,我使用了数组中值的聚合并ct_lt从中进行计数。也许,它会对大量数据造成一些惩罚。但即便如此,这个版本也会比其他版本更好。

更新:

适配9.4:

EXPLAIN ANALYZE
with dd2 as (select d.company_id, d.trade_date, d.daily_val,
        count(daily_val) OVER w - 1 AS ct,
        array_agg(daily_val) OVER w AS ct_lt2
        from daily_data AS d
        where wh_calc_id = 241 AND company_id < 8 
        WINDOW w AS (PARTITION BY company_id, wh_calc_id ORDER BY trade_date)
) , dd1 AS (  
SELECT d.*
, (SELECT count(*) FROM unnest(ct_lt2) AS u WHERE u < d.daily_val) AS ct_lt

FROM   dd2 d
) SELECT *
     , round(CASE WHEN ct = 0 THEN numeric '1'  
                  ELSE ct_lt::numeric / ct END, 6) AS pctl_calc
FROM dd1 AS d
ORDER  BY company_id, trade_date;
Run Code Online (Sandbox Code Playgroud)

指数:

CREATE INDEX daily_data_i1_idx
  ON daily_data
  (wh_calc_id, company_id, trade_date);
                                                                     QUERY PLAN                                                                     
----------------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=1311.98..1312.82 rows=333 width=120) (actual time=1597.558..1613.481 rows=7407 loops=1)
   Sort Key: d.company_id, d.trade_date
   Sort Method: quicksort  Memory: 61659kB
   CTE dd2
     ->  WindowAgg  (cost=824.34..833.50 rows=333 width=84) (actual time=10.416..87.668 rows=7407 loops=1)
           ->  Sort  (cost=824.34..825.17 rows=333 width=44) (actual time=10.381..11.851 rows=7407 loops=1)
                 Sort Key: d_1.company_id, d_1.trade_date
                 Sort Method: quicksort  Memory: 771kB
                 ->  Bitmap Heap Scan on daily_data d_1  (cost=11.83..810.39 rows=333 width=44) (actual time=2.934..6.548 rows=7407 loops=1)
                       Recheck Cond: ((wh_calc_id = 241) AND (company_id < 8))
                       Heap Blocks: exact=1460
                       ->  Bitmap Index Scan on daily_data_i1_idx  (cost=0.00..11.75 rows=333 width=0) (actual time=2.661..2.661 rows=7407 loops=1)
                             Index Cond: ((wh_calc_id = 241) AND (company_id < 8))
   CTE dd1
     ->  CTE Scan on dd2 d_2  (cost=0.00..454.55 rows=333 width=88) (actual time=10.442..1320.450 rows=7407 loops=1)
           SubPlan 2
             ->  Aggregate  (cost=1.33..1.34 rows=1 width=8) (actual time=0.143..0.143 rows=1 loops=7407)
                   ->  Function Scan on unnest u  (cost=0.00..1.25 rows=33 width=0) (actual time=0.036..0.124 rows=234 loops=7407)
                         Filter: (u < d_2.daily_val)
                         Rows Removed by Filter: 243
   ->  CTE Scan on dd1 d  (cost=0.00..9.99 rows=333 width=120) (actual time=10.450..1468.533 rows=7407 loops=1)
 Planning time: 0.471 ms
 Execution time: 1960.835 ms
Run Code Online (Sandbox Code Playgroud)