为什么EXPLAIN结果中的实际时间与PostgreSQL中的每次执行都不同?

dmk*_*kov 5 database postgresql amazon-rds sql-execution-plan

我试图测试PostgreSQL中不同分区模式的效率.我用一个表和不同的分区大小创建了几个数据库,然后在每个数据库中填充了数亿个记录.

现在我想测试所有这些查询并比较结果.但问题是,当我运行相同的查询时,我在一个数据库上收到了不同的实际时间.

例如,我EXPLAIN (buffers, analyze, verbose) SELECT count(*) FROM testdata WHERE dt = '2017-06-01 08:50:00'第一次执行:

                                                                         QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------
Aggregate  (cost=356199.96..356199.97 rows=1 width=8) (actual time=155757.190..155757.190 rows=1 loops=1)
  Output: count(*)
  Buffers: shared read=290555
  ->  Append  (cost=0.00..354506.51 rows=677381 width=0) (actual time=165.565..155583.815 rows=665275 loops=1)
        Buffers: shared read=290555
        ->  Seq Scan on public.testdata  (cost=0.00..0.00 rows=1 width=0) (actual time=0.002..0.002 rows=0 loops=1)
              Filter: (testdata.dt = '2017-06-01 08:50:00'::timestamp without time zone)
        ->  Bitmap Heap Scan on public.testdata_2017_06_01  (cost=12682.26..354506.51 rows=677380 width=0) (actual time=165.562..155319.677 rows=665275 loops=1)
              Recheck Cond: (testdata_2017_06_01.dt = '2017-06-01 08:50:00'::timestamp without time zone)
              Rows Removed by Index Recheck: 31006916
              Heap Blocks: exact=25283 lossy=263451
              Buffers: shared read=290555
              ->  Bitmap Index Scan on testdata_2017_06_01_dt_idx  (cost=0.00..12512.91 rows=677380 width=0) (actual time=159.597..159.597 rows=665275 loops=1)
                    Index Cond: (testdata_2017_06_01.dt = '2017-06-01 08:50:00'::timestamp without time zone)
                    Buffers: shared read=1821
Planning time: 75.419 ms
Execution time: 155757.564 ms
(17 rows)
Run Code Online (Sandbox Code Playgroud)

第二次:

                                                                           QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------
Aggregate  (cost=356199.96..356199.97 rows=1 width=8) (actual time=18871.164..18871.164 rows=1 loops=1)
  Output: count(*)
  Buffers: shared read=290555
  ->  Append  (cost=0.00..354506.51 rows=677381 width=0) (actual time=148.208..18731.027 rows=665275 loops=1)
        Buffers: shared read=290555
        ->  Seq Scan on public.testdata  (cost=0.00..0.00 rows=1 width=0) (actual time=0.002..0.002 rows=0 loops=1)
              Filter: (testdata.dt = '2017-06-01 08:50:00'::timestamp without time zone)
        ->  Bitmap Heap Scan on public.testdata_2017_06_01  (cost=12682.26..354506.51 rows=677380 width=0) (actual time=148.204..18489.171 rows=665275 loops=1)
              Recheck Cond: (testdata_2017_06_01.dt = '2017-06-01 08:50:00'::timestamp without time zone)
              Rows Removed by Index Recheck: 31006916
              Heap Blocks: exact=25283 lossy=263451
              Buffers: shared read=290555
              ->  Bitmap Index Scan on testdata_2017_06_01_dt_idx  (cost=0.00..12512.91 rows=677380 width=0) (actual time=143.119..143.119 rows=665275 loops=1)
                    Index Cond: (testdata_2017_06_01.dt = '2017-06-01 08:50:00'::timestamp without time zone)
                    Buffers: shared read=1821
Planning time: 2.105 ms
Execution time: 18871.482 ms
(17 rows)
Run Code Online (Sandbox Code Playgroud)

差异是巨大的(155秒对18秒),我的第一个是第一次缓存结果.但上面的解释不会hitbuffers部分中显示值.两次执行的估计费用也相同.为什么实际时间不同?是否有任何服务工作,如创建一个需要这么长时间的查询计划?

如果我尝试对另一个日期执行相同的查询,则情况会重复.如果PostgreSQL缓存查询计划,是否会使用每个新参数值重新生成?它对我没有意义,因为它在这种情况下像查询结果缓存一样工作.

一个侧面问题:什么是获得查询执行时间的纯结果来比较不同分区模式的更好方法?在MySQL中我可以SELECT SQL_NO_CACHE用来禁用查询中的任何缓存.在PostgreSQL中,DISCARD ALL我认为,但它对上面的解释没有帮助.

更新#1:

主testdata表的定义:

=> \d+ testdata;
                                    Table "public.testdata"
   Column   |            Type             |   Modifiers   | Storage  | Stats target | Description
------------+-----------------------------+---------------+----------+--------------+-------------
 param1id   | integer                     | not null      | plain    |              |
 param2id   | integer                     |               | plain    |              |
 param3id   | integer                     |               | plain    |              |
 devid      | integer                     |               | plain    |              |
 nameid     | integer                     |               | plain    |              |
 dt         | timestamp without time zone | default now() | plain    |              |
 value      | double precision            |               | plain    |              |
 svalue     | character varying(255)      |               | extended |              |
Triggers:
    testdata_partition_insert_trigger BEFORE INSERT ON testdata FOR EACH ROW EXECUTE PROCEDURE create_partition_and_insert()
Child tables: testdata_2017_06_01,
              testdata_2017_06_02,
              ... other child tables, one per day ...
              testdata_2017_06_30
Run Code Online (Sandbox Code Playgroud)

儿童testdata_2017_06_01的定义:

\d+ testdata_2017_06_01;
                               Table "public.testdata_2017_06_01"
   Column   |            Type             |   Modifiers   | Storage  | Stats target | Description
------------+-----------------------------+---------------+----------+--------------+-------------
 param1id   | integer                     | not null      | plain    |              |
 param2id   | integer                     |               | plain    |              |
 param3id   | integer                     |               | plain    |              |
 devid      | integer                     |               | plain    |              |
 nameid     | integer                     |               | plain    |              |
 dt         | timestamp without time zone | default now() | plain    |              |
 value      | double precision            |               | plain    |              |
 svalue     | character varying(255)      |               | extended |              |
Indexes:
    "testdata_2017_06_01_devid_idx" btree (devid)
    "testdata_2017_06_01_dt_idx" btree (dt)
    "testdata_2017_06_01_nameid_idx" btree (nameid)
Check constraints:
    "partition_check" CHECK (dt >= '2017-06-01 00:00:00'::timestamp without time zone AND dt <= '2017-06-01 23:59:59'::timestamp without time zone)
Inherits: testdata
Run Code Online (Sandbox Code Playgroud)

更新#2:

打开track_io_timing@rvs建议后,我重新启动了实例,图片发生了显着变化.首先,可释放内存从6GB增加到近8GB - 截图.其次,规划师开始使用Index Only Scan而不是Bitmap Heap Scan.然后,所有查询现在花费的时间要少得多(第一个查询大约需要2.4毫秒,而当它到达缓存时大约需要0.041毫秒.对于初始155和18秒).最后,它开始点击缓存(在上面的示例中,所有解释仅显示读取缓冲区).

                                                                               QUERY PLAN
 ----------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=4.59..4.60 rows=1 width=8) (actual time=0.014..0.014 rows=1 loops=1)
   Output: count(*)
   Buffers: shared hit=4
   ->  Append  (cost=0.00..4.58 rows=2 width=0) (actual time=0.012..0.012 rows=0 loops=1)
         Buffers: shared hit=4
         ->  Seq Scan on public.testdata  (cost=0.00..0.00 rows=1 width=0) (actual time=0.002..0.002 rows=0 loops=1)
               Filter: (testdata.dt = '2017-06-15 18:10:10'::timestamp without time zone)
         ->  Index Only Scan using testdata_2017_06_15_dt_idx on public.testdata_2017_06_15  (cost=0.56..4.58 rows=1 width=0) (actual time=0.009..0.009 rows=0 loops=1)
               Index Cond: (testdata_2017_06_15.dt = '2017-06-15 18:10:10'::timestamp without time zone)
               Heap Fetches: 0
               Buffers: shared hit=4
 Planning time: 0.888 ms
 Execution time: 0.044 ms
(13 rows)
Run Code Online (Sandbox Code Playgroud)

是否有大量数据插入溢出所有可用内存(8GB)?我们插入了一个包含40kk记录的分区,然后将其复制到另一个分区.我们是否应该期待这样的生产(我们将使用批量插入每秒插入大约1k条记录)?如果我们需要不时地重启数据库实例,这是非常可怕的.

谢谢你的帮忙!

rvs*_*rvs 2

您有多少数据量以及多少 RAM?

您的后续查询很可能会命中缓存(PostgreSQL 也将使用操作系统级缓存)。在您的会话中尝试set track_io_timing = true并查看 Postgres 为您报告的 I/O 计时。

由于 的设置不正确,估计可能会出现偏差effective_cache_size

回答你的附带问题:不要寻求“纯粹”的时间。尝试复制真实场景。这个查询会重复运行吗?是否还会有其他查询并行运行?始终将所有内容运行多次。缓存大小调整是性能调整的重要组成部分。