了解postgres缓存

Question

了解postgres缓存

我知道postgres使用LRU /时钟扫描算法从缓存中驱逐数据,但很难理解它如何进入shared_buffers.

请注意,我的目的不是让这个天真的查询更快,索引总是最好的选择.但我想了解缓存如何在没有索引的情况下工作.

让我们从示例中获取以下查询执行计划(我故意不包含/创建索引)

performance_test=# explain (analyze,buffers) select count(*) from users;
                                                      QUERY PLAN                                                       
-----------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=48214.95..48214.96 rows=1 width=0) (actual time=3874.445..3874.445 rows=1 loops=1)
   Buffers: shared read=35715
   ->  Seq Scan on users  (cost=0.00..45714.96 rows=999996 width=0) (actual time=6.024..3526.606 rows=1000000 loops=1)
         Buffers: shared read=35715
 Planning time: 0.114 ms
 Execution time: 3874.509 ms

Run Code Online (Sandbox Code Playgroud)

我们可以看到所有数据都是从磁盘中获取的,即共享读取= 35715.

现在,如果我们再次执行相同的查询.

performance_test=# explain (analyze,buffers) select count(*) from users;
                                                      QUERY PLAN                                                      
----------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=48214.95..48214.96 rows=1 width=0) (actual time=426.385..426.385 rows=1 loops=1)
   Buffers: shared hit=32 read=35683
   ->  Seq Scan on users  (cost=0.00..45714.96 rows=999996 width=0) (actual time=0.036..285.363 rows=1000000 loops=1)
         Buffers: shared hit=32 read=35683
 Planning time: 0.048 ms
 Execution time: 426.431 ms

Run Code Online (Sandbox Code Playgroud)

只有32页/块进入内存.重复此操作时,共享命中数增加32.

performance_test=# explain (analyze,buffers) select count(*) from users;
                                                      QUERY PLAN                                                      
----------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=48214.95..48214.96 rows=1 width=0) (actual time=416.829..416.829 rows=1 loops=1)
   Buffers: shared hit=64 read=35651
   ->  Seq Scan on users  (cost=0.00..45714.96 rows=999996 width=0) (actual time=0.034..273.417 rows=1000000 loops=1)
         Buffers: shared hit=64 read=35651
 Planning time: 0.050 ms
 Execution time: 416.874 ms

Run Code Online (Sandbox Code Playgroud)

我的shared_buffers = 1GB,表大小为279MB.因此整个表可以缓存在内存中,但事实并非如此,缓存的工作方式也有所不同.有人可以解释它如何计划并将数据从磁盘移动到shared_buffers.

是否有一种机制,可以控制每个查询可以将多少页面移动到shared_buffers中.

Answer 1

Pet*_*aut 5

有一种机制可以防止整个缓冲区缓存被顺序扫描吹走.它解释src/backend/storage/buffer/README如下:

当运行仅需要访问大量页面的查询(例如VACUUM或大型顺序扫描)时,将使用不同的策略.仅仅通过这种扫描触摸的页面不太可能很快再次需要,因此不使用正常的时钟扫描算法并吹掉整个缓冲区高速缓存,而是使用正常时钟扫描算法分配一小圈缓冲区,这些缓冲区可以重复用于整个扫描.这也意味着由此类语句引起的大量写入流量将由后端本身完成,而不会被推送到其他进程.

对于顺序扫描,使用256KB环....

请注意,32✕8kB= 256kB,这就是您所看到的.

归档时间：	9 年，10 月前
查看次数：	616 次
最近记录：	9 年，10 月前