PostgreSQL 索引缓存

Question

PostgreSQL 索引缓存

dav*_*ley 18 postgresql performance cache index-tuning

我很难找到关于如何在 PostgreSQL 中缓存索引的“外行”解释，所以我想对这些假设中的任何一个或所有假设进行现实检查：

PostgreSQL 索引，如行，存在于磁盘上，但可能会被缓存。
索引可能完全在缓存中，也可能根本不在缓存中。
它是否被缓存取决于它的使用频率（由查询规划器定义）。
出于这个原因，大多数“明智”的索引将一直在缓存中。
索引buffer cache与行位于相同的缓存（？）中，因此索引使用的缓存空间不可用于行。

我理解这一点的动机来自我问的另一个问题，其中建议可以在大部分数据永远不会被访问的表上使用部分索引。

在开始之前，我想明确一点，使用部分索引会产生两个优势：

我们减少了缓存中索引的大小，为缓存中的行本身释放了更多空间。
我们减少了 B 树的大小，从而加快了查询响应。

Answer 1

dez*_*zso 20

稍微玩一下pg_buffercache，我可以得到你的一些问题的答案。

这很明显，但(5)的结果也表明答案是YES
我还没有为此建立一个很好的例子，现在它更像是是比否:)（请参阅下面我的编辑，答案是否定的。）
由于计划者决定是否使用索引，我们可以说YES，它决定缓存（但这更复杂）
缓存的确切细节可以从源代码中获得，我找不到关于这个主题的太多信息，除了这个（参见作者的回答）。但是，我很确定这又比简单的“是”或“否”要复杂得多。（同样，从我的编辑中，您可以得到一些想法 - 由于缓存大小有限，那些“合理”的索引会争夺可用空间。如果它们太多，它们会从缓存中相互踢出 - 所以答案是否定的。 )
作为一个简单的查询pg_buffercache显示，答案是明确的YES。值得注意的是，临时表数据不会在这里缓存。

编辑

我找到了 Jeremiah Peschka关于表和索引存储的精彩文章。有了那里的信息，我也可以回答（2）。我设置了一个小测试，所以你可以自己检查这些。

-- we will need two extensions
CREATE EXTENSION pg_buffercache;
CREATE EXTENSION pageinspect;


-- a very simple test table
CREATE TABLE index_cache_test (
      id serial
    , blah text
);


-- I am a bit megalomaniac here, but I will use this for other purposes as well
INSERT INTO index_cache_test
SELECT i, i::text || 'a'
FROM generate_series(1, 1000000) a(i);


-- let's create the index to be cached
CREATE INDEX idx_cache_test ON index_cache_test (id);


-- now we can have a look at what is cached
SELECT c.relname,count(*) AS buffers
FROM 
    pg_class c 
    INNER JOIN pg_buffercache b ON b.relfilenode = c.relfilenode 
    INNER JOIN pg_database d ON (b.reldatabase = d.oid AND d.datname = current_database())
GROUP BY c.relname
ORDER BY 2 DESC LIMIT 10;

             relname              | buffers
----------------------------------+---------
 index_cache_test                 |    2747
 pg_statistic_relid_att_inh_index |       4
 pg_operator_oprname_l_r_n_index  |       4
... (others are all pg_something, which are not interesting now)

-- this shows that the whole table is cached and our index is not in use yet

-- now we can check which row is where in our index
-- in the ctid column, the first number shows the page, so 
-- all rows starting with the same number are stored in the same page
SELECT * FROM bt_page_items('idx_cache_test', 1);

 itemoffset |  ctid   | itemlen | nulls | vars |          data
------------+---------+---------+-------+------+-------------------------
          1 | (1,164) |      16 | f     | f    | 6f 01 00 00 00 00 00 00
          2 | (0,1)   |      16 | f     | f    | 01 00 00 00 00 00 00 00
          3 | (0,2)   |      16 | f     | f    | 02 00 00 00 00 00 00 00
          4 | (0,3)   |      16 | f     | f    | 03 00 00 00 00 00 00 00
          5 | (0,4)   |      16 | f     | f    | 04 00 00 00 00 00 00 00
          6 | (0,5)   |      16 | f     | f    | 05 00 00 00 00 00 00 00
...
         64 | (0,63)  |      16 | f     | f    | 3f 00 00 00 00 00 00 00
         65 | (0,64)  |      16 | f     | f    | 40 00 00 00 00 00 00 00

-- with the information obtained, we can write a query which is supposed to
-- touch only a single page of the index
EXPLAIN (ANALYZE, BUFFERS) 
    SELECT id 
    FROM index_cache_test 
    WHERE id BETWEEN 10 AND 20 ORDER BY id
;

 Index Scan using idx_test_cache on index_cache_test  (cost=0.00..8.54 rows=9 width=4) (actual time=0.031..0.042 rows=11 loops=1)
   Index Cond: ((id >= 10) AND (id <= 20))
   Buffers: shared hit=4
 Total runtime: 0.094 ms
(4 rows)

-- let's have a look at the cache again (the query remains the same as above)
             relname              | buffers
----------------------------------+---------
 index_cache_test                 |    2747
 idx_test_cache                   |       4
...

-- and compare it to a bigger index scan:
EXPLAIN (ANALYZE, BUFFERS) 
SELECT id 
    FROM index_cache_test 
    WHERE id <= 20000 ORDER BY id
;


 Index Scan using idx_test_cache on index_cache_test  (cost=0.00..666.43 rows=19490 width=4) (actual time=0.072..19.921 rows=20000 loops=1)
   Index Cond: (id <= 20000)
   Buffers: shared hit=4 read=162
 Total runtime: 24.967 ms
(4 rows)

-- this already shows that something was in the cache and further pages were read from disk
-- but to be sure, a final glance at cache contents:

             relname              | buffers
----------------------------------+---------
 index_cache_test                 |    2691
 idx_test_cache                   |      58

-- note that some of the table pages are disappeared
-- but, more importantly, a bigger part of our index is now cached

Run Code Online (Sandbox Code Playgroud)

总而言之，这表明索引和表可以逐页缓存，因此（2）的答案是否定的。

最后一个说明临时表在此处未缓存：

CREATE TEMPORARY TABLE tmp_cache_test AS 
SELECT * FROM index_cache_test ORDER BY id FETCH FIRST 20000 ROWS ONLY;

EXPLAIN (ANALYZE, BUFFERS) SELECT id FROM tmp_cache_test ORDER BY id;

-- checking the buffer cache now shows no sign of the temp table

Run Code Online (Sandbox Code Playgroud)

Answer 2

Gre*_*ith 10

当查询决定索引页有助于减少回答查询所需的表数据量时，就会获取索引页。只有导航完成的索引块被读入。是的，它们进入存储表数据的同一个 shared_buffers 池。两者都由操作系统缓存作为第二层缓存支持。

您可以轻松地在内存中拥有 0.1% 或 100% 的索引。当您的查询只涉及表的一个子集时，大多数“'合理'索引将一直在缓存中”的想法很难被推翻。一个常见的例子是，如果您有面向时间的数据。通常那些人通常会浏览最近的表格，很少查看旧的历史。在那里，您可能会在内存中找到导航到最近末尾及其周围所需的所有索引块，而导航较早记录所需的索引块很少。

实现的复杂部分不是块如何进入缓冲区缓存。这是他们离开时的规则。My Inside the PostgreSQL Buffer Cache谈话和其中包含的示例查询可以帮助您了解那里发生的事情，并了解生产服务器上真正积累的内容。这可能令人惊讶。在我的《PostgreSQL 9.0 高性能》一书中，还有更多关于所有这些主题的内容。

部分索引很有用，因为它们减少了索引的大小，因此既可以更快地导航，又可以留出更多的 RAM 来缓存其他内容。如果您对索引的导航是这样的，您触摸的部分总是在 RAM 中，无论如何，这可能不会带来真正的改进。

归档时间：	12 年，11 月前
查看次数：	10263 次
最近记录：	4 年前