在线确定索引压缩候选者

Lei*_*fel 5 index oracle oracle-11g-r2 oracle-11g

通过执行以下操作,我可以确定索引是否会受益于压缩以及压缩中应包含多少列:

ANALYZE INDEX Owner.IndexName VALIDATE STRUCTURE OFFLINE;
SELECT Opt_Cmpr_PctSave, Opt_Cmpr_Count FROM Index_Stats; 
Run Code Online (Sandbox Code Playgroud)

问题是当OFFLINE更改为ONLINEIndex_Stats 视图时不会填充。是否有一种在线方法可以确定压缩索引的好处和/或将产生最佳压缩的列数?

更新:

http://jonathanlewis.wordpress.com/index-definitions/表示如果 DBA_Indexes 中的 Distinct_Keys 比 num_rows“小很多”,那么该索引是一个很好的压缩候选者。这对某些人有帮助,但不是确定的,也无助于确定列数。他确实为此提供了一些指导方针,但没有一堆动态 SQL 就无法以编程方式确定。

Jac*_*las 3

要压缩的最佳列数取决于

  • 每个块中适合的条目数(这取决于压缩列的数量,因为它们每个块仅存储一次)
  • 具有相同前缀的平均条目数

这些因素可以通过表格进行估计

目的是最大化压缩前缀的大小,同时最小化保存具有相同前缀的所有行所需的块数量。

假设数据至少在一定程度上是一致的,并且忽略压缩引入的少量开销,您可以尝试像这样实现这种方法:

辅助函数:

create or replace function f_size( p_table_name in varchar, 
                                   p_column_name in varchar) 
                  return number as
  n number;
begin
  execute immediate 
    'select avg(vsize('||p_column_name||'))+1 from '||p_table_name into n;
  return n;
end;
/

create or replace function f_count( p_table_name in varchar, 
                                    p_column_names in varchar ) 
                  return integer as
  n integer;
begin
  execute immediate 'select count(*) '||
                    'from ( select '|| p_column_names || 
                           ' from '||p_table_name||' '||
                           'group by '||p_column_names||' )' 
          into n;
  return n;
end;
/
Run Code Online (Sandbox Code Playgroud)

测试物联网:

create table t ( k1, k2, k3, k4, k5, val, 
                 constraint pk_t primary key(k1, k2, k3, k4, k5)) 
       organization index as
select mod(k,10)||'_____', 
       mod(k,20)||'_____', 
       mod(k,30)||'_____', 
       mod(k,50)||'_____', 
       k||'_____', 
       lpad(' ',100)
from (select level as k from dual connect by level<=1000);
Run Code Online (Sandbox Code Playgroud)

询问:

with utc as (select table_name, column_name, f_size(table_name, column_name) as column_size from user_tab_columns where table_name='T'),
     uic as (select table_name, column_name, column_position, column_size from user_ind_columns join utc using(table_name, column_name) where index_name='PK_T')
select z.*, (8192-prefix_size*prefixes_per_block)/remaining_size as rows_per_block
from( select z.*, greatest(1,8192/(prefix_size+rows_per_prefix*remaining_size)) as prefixes_per_block
      from( select z.*, total_count/distinct_count as rows_per_prefix
            from( select prefix_length, sum(column_size) as prefix_size, (select sum(column_size) from utc)-sum(column_size) as remaining_size, f_count(table_name, max(prefix_columns)) as distinct_count, 
                         (select count(*) from t) as total_count
                  from( select table_name, connect_by_root column_position as prefix_length, column_size, substr(sys_connect_by_path(column_name, ','),2) as prefix_columns
                        from uic
                        connect by column_position=(prior column_position-1) )
                  group by table_name, prefix_length ) z ) z ) z
order by 1;
Run Code Online (Sandbox Code Playgroud)

结果:

PREFIX_LENGTH          PREFIX_SIZE            REMAINING_SIZE         DISTINCT_COUNT         TOTAL_COUNT            ROWS_PER_PREFIX        PREFIXES_PER_BLOCK     ROWS_PER_BLOCK         
---------------------- ---------------------- ---------------------- ---------------------- ---------------------- ---------------------- ---------------------- ---------------------- 
1                      7                      132.854                10                     1000                   100                    1                      61.608 
2                      14.5                   125.354                20                     1000                   50                     1.304                  65.200 
3                      22.161                 117.693                60                     1000                   16.666                 4.129                  68.827 
4                      29.961                 109.893                300                    1000                   3.333                  20.672                 68.909 
5                      38.854                 101                    1000                   1000                   1                      58.575                 58.575 
Run Code Online (Sandbox Code Playgroud)

查看:

analyze index pk_t validate structure;
select opt_cmpr_pctsave, opt_cmpr_count from index_stats;

OPT_CMPR_PCTSAVE       OPT_CMPR_COUNT         
---------------------- ---------------------- 
13                     3                      
Run Code Online (Sandbox Code Playgroud)

rows_per_block上面的检查大致对应于计算中最大值的前缀长度- 但我建议您在相信它之前仔细检查我的工作:)

我假设该表太大,您不能只复制一份并尝试不同的前缀长度。另一种方法是在数据样本上执行此操作 - 样本应选择为给定压缩候选的前缀的随机选择(而不仅仅是行的随机选择)