RDS 上的 Postgres 11.4 和家里的 11.5。
我今天更仔细地查看哈希索引,因为我遇到了 citext 索引被忽略的问题。而且我发现我不明白为什么哈希索引如此之大。当我预计它需要 10 个字节 + 一些开销时,它需要大约 50 个字节/行。
我有一个示例数据库,其中包含一个名为 record_changes_log_detail 的表,该表有 7,733,552 条记录,因此约为 8M。在该表中有一个名为 old_value 的 citext 字段,它是哈希索引的来源:
CREATE INDEX record_changes_log_detail_old_value_ix_hash
ON record_changes_log_detail
USING hash (old_value);
Run Code Online (Sandbox Code Playgroud)
这是对索引大小的检查:
select
'record_changes_log_detail_old_value_ix_hash' as index_name,
pg_relation_size ('record_changes_log_detail_old_value_ix_hash') as bytes,
pg_size_pretty(pg_relation_size ('record_changes_log_detail_old_value_ix_hash')) as pretty
Run Code Online (Sandbox Code Playgroud)
这将返回 379,322,368 字节,或大约 362MB。我已经深入挖掘了源代码,并且对这篇精美的作品进行了更多研究。
听起来像行的哈希索引条目是与哈希键本身配对的 TID。以及页面内的某种索引计数器。那是两个 4 字节的整数,我猜是 1 或 2 字节的整数。作为一个简单的计算,10 字节 * 7,733,552 = 77,335,520。实际索引大约比那个大 5 倍。诚然,您需要为索引结构本身提供空间,但它不应该将每行的粗略成本从 ~10 字节到 ~50 字节,对吗?
这是索引的详细信息,使用pageinspect扩展阅读,然后手动旋转以提高可读性。
select *
from hash_metapage_info(get_raw_page('record_changes_log_detail_old_value_ix_hash',0));
magic 105121344
version 4
ntuples 7733552
ffactor 307
bsize 8152
bmsize 4096
bmshift 15
maxbucket 28671
highmask 32767
lowmask 16383
ovflpoint 32
firstfree 17631
nmaps 1
procid 17269
spares {0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,17631,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0}
mapp {28673,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0}
select *
from hash_page_stats(get_raw_page('record_changes_log_detail_old_value_ix_hash',1));
live_items 2
dead_items 0
page_size 8192
free_size 8108
hasho_prevblkno 28671
hasho_nextblkno 4294967295
hasho_bucket 0
hasho_flag 2
hasho_page_id 65408
Run Code Online (Sandbox Code Playgroud)
我得到了一个新构建的索引,大小为 256MB,表大小也相同。您的索引是新建立的吗?是否在构建之前对表进行了新分析(索引是根据表中的估计行预先确定大小的)。您的重复项分布情况如何?
事物以最小 8 字节对齐方式存储,因此哈希索引元组是 16 字节,即使它应该适合 10(或 12,或其他)。哈希页平均只有一半满。桶按预定顺序分裂,它必须分裂下一个桶,而不是最满的桶。
Run Code Online (Sandbox Code Playgroud)select * from hash_page_stats(get_raw_page('record_changes_log_detail_old_value_ix_hash',1)); live_items 2 dead_items 0 page_size 8192 free_size 8108
仅查看一页您不会学到很多东西,但该页奇怪地缺乏元组。也许你有一个病态的数据分布。
对数据库进行微观管理到这种程度是不值得的。
| 归档时间: |
|
| 查看次数: |
199 次 |
| 最近记录: |