Postgres 9.4.4 查询需要永远

use*_*922 5 postgresql performance postgresql-9.4 query-performance

我们在 CentOS 6.5 上运行 Postgres 9.4.4 并且有一个已经工作多年的 SELECT 查询,但在我们从 9.2 升级后停止工作并挂起(花了一段时间才注意到它,所以我不知道它是否是我们升级与否后立即)。

SELECT id || ':' || group_number AS uniq_id
FROM   table_one
WHERE  id || ':' || group_number NOT IN (
   SELECT id || ':' || group_number
   FROM table_two
   )
AND    id NOT IN (
   SELECT id
   FROM table_three
   WHERE timestamp > NOW() - INTERVAL '30 days' 
   AND client_id > 0
   );
Run Code Online (Sandbox Code Playgroud)

在所有表中id都是一个整数,但存储为character varying (15)(旧系统)。group_number存储为smallint.

table_two 的子查询返回大约 250 万条记录。的子查询table_three返回大约 2,500 条记录。如果单独运行,两者都在大约 1 秒内返回。但是将任一查询(或两者)添加为子查询会导致查询无限期挂起(如果我们让它运行,则会持续数天)。

我在网上看到其他人有同样的问题(使用时查询不返回NOT IN)。NOT IN看起来像这样一个直接的子查询。

我们有大量的硬件(384 GB RAM、至强 64 核、16 个磁盘 15k RPM RAID 10)。

  1. 为什么会这样?(即这是 Postgres 中一个主要的持续错误吗?)
  2. 在此期间如何修复/调试它?

以下是结果EXPLAIN

QUERY PLAN
Index Only Scan using table_one_id_pk on table_one  (cost=19690.90..64045129699.10 rows=370064 width=9)
  Filter: ((NOT (hashed SubPlan 2)) AND (NOT (SubPlan 1)))
  SubPlan 2
    ->  Bitmap Heap Scan on table_three  (cost=2446.92..19686.74 rows=8159 width=7)
          Recheck Cond: (("timestamp" > (now() - '30 days'::interval)) AND (client_id > 0))
          ->  BitmapAnd  (cost=2446.92..2446.92 rows=8159 width=0)
                ->  Bitmap Index Scan on table_one_timestamp_idx  (cost=0.00..1040.00 rows=79941 width=0)
                      Index Cond: ("timestamp" > (now() - '30 days'::interval))
                ->  Bitmap Index Scan on fki_table_three_client_id  (cost=0.00..1406.05 rows=107978 width=0)
                      Index Cond: (client_id > 0)
  SubPlan 1
    ->  Materialize  (cost=0.00..84813.75 rows=3436959 width=9)
          ->  Seq Scan on table_two  (cost=0.00..64593.79 rows=3436959 width=9)
Run Code Online (Sandbox Code Playgroud)

我的设置来自 postgresql.conf

max_connections = 200
shared_buffers = 24GB
temp_buffers = 8MB
work_mem = 96MB
maintenance_work_mem = 1GB
cpu_tuple_cost = 0.0030
cpu_index_tuple_cost = 0.0010
cpu_operator_cost = 0.0005
effective_cache_size = 128GB
from_collapse_limit = 4
join_collapse_limit = 4
Run Code Online (Sandbox Code Playgroud)

更新

我使用以下方法来调整work_mem仅用于此查询:

BEGIN;
SET work_mem = '256MB';
-- query --
SET work_mem = default;
COMMIT;
Run Code Online (Sandbox Code Playgroud)

使用NOT IN在 5 - 8 秒内返回(与从不使用work_mem = 96MB)。

usingLEFT JOIN在 13 - 14 秒内返回(与 24 秒相比work_mem = 96MB)。

所以看起来问题出work_mem,而使用LEFT JOIN只是一种解决方法。然而,真正的问题是 Postgres 使用work_mem = 96MB.

使用 RAID 10 中的 16 x 15k SAS 驱动器,我们的 I/O 速度非常快,因此即使是磁盘查询也应该返回,只是速度稍慢。

更新 2

以下是 LEFT JOIN 方法中 EXPLAIN ANALYZE 的结果:

    QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop Anti Join  (cost=27318.56..351160.97 rows=728325 width=9) (actual time=9553.378..21247.202 rows=7 loops=1)
   ->  Hash Anti Join  (cost=27318.47..176945.69 rows=1501249 width=9) (actual time=511.578..5479.549 rows=1478438 loops=1)
         Hash Cond: ((t1.id)::text = (t3.id)::text)
         ->  Seq Scan on table_one t1  (cost=0.00..143842.21 rows=1593403 width=9) (actual time=0.026..4369.804 rows=1485291 loops=1)
         ->  Hash  (cost=27289.76..27289.76 rows=8203 width=7) (actual time=511.518..511.518 rows=1286 loops=1)
               Buckets: 1024  Batches: 1  Memory Usage: 51kB
               ->  Bitmap Heap Scan on table_three t3  (cost=1518.79..27289.76 rows=8203 width=7) (actual time=125.379..510.998 rows=1286 loops=1)
                     Recheck Cond: (client_id > 0)
                     Filter: ("timestamp" > (now() - '30 days'::interval))
                     Rows Removed by Filter: 104626
                     Heap Blocks: exact=16093
                     ->  Bitmap Index Scan on fki_table_three_client_id  (cost=0.00..1518.38 rows=108195 width=0) (actual time=121.633..121.633 rows=122976 loops=1)
                           Index Cond: (client_id > 0)
   ->  Index Only Scan using t_table_two_id2_idx on table_two t2  (cost=0.09..0.14 rows=1 width=9) (actual time=0.010..0.010 rows=1 loops=1478438)
         Index Cond: ((id = (t1.id)::text) AND (group_number = t1.group_number))
         Heap Fetches: 143348
 Planning time: 30.527 ms
 Execution time: 21247.541 ms
(18 rows)

Time: 23697.256 ms
Run Code Online (Sandbox Code Playgroud)

在这里,它们用于 NOT EXISTS 方法:

    QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop Anti Join  (cost=27318.56..351160.97 rows=728325 width=9) (actual time=5117.110..14061.838 rows=7 loops=1)
   ->  Hash Anti Join  (cost=27318.47..176945.69 rows=1501249 width=9) (actual time=146.779..1254.400 rows=1478439 loops=1)
         Hash Cond: ((t1.id)::text = (t3.id)::text)
         ->  Seq Scan on table_one t1  (cost=0.00..143842.21 rows=1593403 width=9) (actual time=0.007..591.383 rows=1485291 loops=1)
         ->  Hash  (cost=27289.76..27289.76 rows=8203 width=7) (actual time=146.758..146.758 rows=1285 loops=1)
               Buckets: 1024  Batches: 1  Memory Usage: 51kB
               ->  Bitmap Heap Scan on table_three t3  (cost=1518.79..27289.76 rows=8203 width=7) (actual time=17.586..146.330 rows=1285 loops=1)
                     Recheck Cond: (client_id > 0)
                     Filter: ("timestamp" > (now() - '30 days'::interval))
                     Rows Removed by Filter: 104627
                     Heap Blocks: exact=16093
                     ->  Bitmap Index Scan on fki_table_one_client_id  (cost=0.00..1518.38 rows=108195 width=0) (actual time=14.415..14.415 rows=122976 loops=1)
                           Index Cond: (client_id > 0)
   ->  Index Only Scan using t_table_two_id2_idx on table_two t2  (cost=0.09..0.14 rows=1 width=9) (actual time=0.008..0.008 rows=1 loops=1478439)
         Index Cond: ((id = (t1.id)::text) AND (group_number = t1.group_number))
         Heap Fetches: 143348
 Planning time: 2.155 ms
 Execution time: 14062.014 ms
(18 rows)

Time: 14065.573 ms
Run Code Online (Sandbox Code Playgroud)

Erw*_*ter 8

假设你在核对通常的嫌疑人维基页面作为由@a_horse评论

另请参阅有关位图索引扫描和work_mem.

询问

该改写的查询应该是大幅快:

SELECT id || ':' || group_number AS uniq_id
    -- id::text || ':' || group_number AS uniq_id  -- with integer
FROM   table_one t1
WHERE  NOT EXISTS ( 
   SELECT 1
   FROM   table_two t2
   WHERE  t2.id = t1.id
   AND    t2.group_number = t1.group_number
   ) 
AND NOT EXISTS (
   SELECT 1
   FROM   table_three t3
   WHERE  t3.timestamp > NOW() - interval '30 days' 
   AND    t3.client_id > 0
   AND    t3.id = t1.id
   );
Run Code Online (Sandbox Code Playgroud)
  • 最重要的问题是比较table_one和之间的连接字符串table_two,这通常比必要的更昂贵,特别是不可sargable

  • 整数存储为字符串是昂贵的废话。你似乎意识到了这一点。integer如果可能,请转换为。如果 varchar 列中只有有效整数,则id您需要做的就是:

    ALTER TABLE table_one ALTER COLUMN id TYPE integer USING id::int;
    
    Run Code Online (Sandbox Code Playgroud)

    并且可能与table_two.

  • NOT IN在任一侧带有NULL 值的陷阱。这就是为什么NOT EXISTS几乎总是更好的原因。(通常在此之上表现更好。)

索引

无论哪种方式,性能的关键是匹配索引。

确保在和上有多列索引table_onetable_two

CREATE INDEX t1_foo_idx ON table_one (id, group_number)
CREATE INDEX t2_foo_idx ON table_two (id, group_number)
Run Code Online (Sandbox Code Playgroud)

可能允许仅索引扫描
使用integer代替varchar,这些会更小更高效,但是:

我建议在 上建立部分多列索引table_three

CREATE INDEX t3_foo_idx ON table_three (timestamp, id)
WHERE  client_id > 0
AND    timestamp > '2015-06-07 0:0';
Run Code Online (Sandbox Code Playgroud)

随着时间的推移,实用性会下降。在适当的时候重新创建具有增加下限的索引 - 这需要对表进行排他锁,因此请考虑CREATE INDEX CONCURRENTLY. 详细解释:

您需要匹配查询中的(更新的)索引条件。即使这看起来多余,也要添加条件。喜欢:

...
AND NOT EXISTS (
   SELECT 1
   FROM   table_three t3
   WHERE  t3.timestamp > NOW() - interval '30 days' 
   AND    t3 timestamp > '2015-06-07 0:0'  -- match index condition
   AND    t3.client_id > 0
   AND    t3.id = t1.id
   );
Run Code Online (Sandbox Code Playgroud)

您可以在部分索引和查询中将函数用作伪常量并自动执行该过程。这个相关答案的最后一章:

SET LOCAL

就像您发现自己一样,work_mem如果查询需要那么多 RAM,则在本地增加查询会有所帮助。考虑SET LOCAL

有了所有建议的改进,您可能不需要再增加work_mem了。