具有多个日期列的分区修剪

sjk*_*sjk 5 schema oracle data-warehouse database-design oracle-11g-r2

我在 Oracle 11g 数据库中有一个大表,它保存了几年的历史数据,所以我想按年对其进行分区。问题是该表有多个日期列并且它们都用于查询,所以我不能只选择一个日期列并将其用作分区键。

大多数时间日期彼此接近,所以我为每一年创建了分区,加上一个“溢出”分区,用于保存跨越年份边界的行。这是一个简化的示例:

create table t (
  start_year int,
  end_year int,
  partition_year int as (case when start_year=end_year then start_year else 0 end),
  data blob 
)
partition by range(partition_year) (
  partition poverflow values less than (1000),
  partition p2000 values less than (2001),
  partition p2001 values less than (2002),
  partition p2002 values less than (2003),
  partition p2003 values less than (2004),
  partition p2004 values less than (2005)
);
Run Code Online (Sandbox Code Playgroud)

这种方法的问题是必须在查询中显式引用partition_year分区修剪(非常可取,因为表很大)不会生效。该表用于多个用户的即席聚合查询;没想到大家都记得这个逻辑。

这可以通过视图解决

create or replace view v as
select *
from t
where partition_year=start_year 
  and partition_year=end_year 
  and partition_year>1000
union all
select *
from t partition (poverflow);
Run Code Online (Sandbox Code Playgroud)

现在像这样查询

select * from v where start_year >= 2003 and end_year <= 2004;
Run Code Online (Sandbox Code Playgroud)

使用正确的分区(以下计划中的 5-6 + 1):

---------------------------------------------------------------------------------------------------
| Id  | Operation                  | Name | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
---------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT           |      |     1 |  4030 |     2   (0)| 00:00:01 |       |       |
|   1 |  VIEW                      | V    |     1 |  4030 |     2   (0)| 00:00:01 |       |       |
|   2 |   UNION-ALL                |      |       |       |            |          |       |       |
|   3 |    PARTITION RANGE ITERATOR|      |     1 |  2041 |     2   (0)| 00:00:01 |     5 |     6 |
|*  4 |     TABLE ACCESS FULL      | T    |     1 |  2041 |     2   (0)| 00:00:01 |     5 |     6 |
|   5 |    PARTITION RANGE SINGLE  |      |     1 |  2041 |     2   (0)| 00:00:01 |     1 |     1 |
|*  6 |     TABLE ACCESS FULL      | T    |     1 |  2041 |     2   (0)| 00:00:01 |     1 |     1 |
---------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   4 - filter("START_YEAR">=2003 AND "END_YEAR"<=2004 AND "END_YEAR">=2003 AND 
              "START_YEAR"<=2004 AND "PARTITION_YEAR"<=2004 AND "PARTITION_YEAR"="START_YEAR" AND 
              "PARTITION_YEAR"="END_YEAR")
   6 - filter("START_YEAR">=2003 AND "END_YEAR"<=2004)
Run Code Online (Sandbox Code Playgroud)

问题是,如果我用日期替换 int 类型,这将不再起作用。我试图从日期中提取年份组件并向视图添加相应的约束,但没有修剪分区。将 partition_year 的类型更改为日期也没有帮助。

有什么办法可以在一个表中有多个日期列并且仍然能够使用分区修剪?

sjk*_*sjk 0

我找到了部分解决方案

通过将视图定义为

create or replace view v as
select *
from t
where partition_date between start_date and end_date 
  and partition_date > date'1000-01-01'
union all
select *
from t partition (poverflow);
Run Code Online (Sandbox Code Playgroud)

以下查询工作正常,仅访问分区 1,4 和 5

select * from v where start_date >= date'2002-01-01' and end_date <= date'2003-01-01';
Run Code Online (Sandbox Code Playgroud)

不过,查询

select * from v where start_date = date'2002-01-01';
Run Code Online (Sandbox Code Playgroud)

扫描分区 1,4-6,而不是 1 和 4(使用 end_date 将访问分区 1-4)。在我们的例子中,这不是一个关键的限制,因为典型的查询仅访问最近几年的数据,针对过去的特定日期和日期范围的查询很少见。

这种方法的一个稍微不同的版本是将partition_date列定义为

case when trunc(start_date,'YEAR')=trunc(end_date,'YEAR') then greatest(start_date,end_date) 
else to_date('01.01.0001') end
Run Code Online (Sandbox Code Playgroud)

和视图

create or replace view v as
select *
from t
where partition_date >= start_date and partition_date >= end_date
  and partition_date > date'1000-01-01'
union all
select *
from t partition (poverflow);
Run Code Online (Sandbox Code Playgroud)

这具有类似的性能,但 start_date 和 end_date 都会导致访问更近的年份。如果像这样放宽要求(只允许修剪前几年),那么实际上不再需要溢出分区,解决方案简化为:

create table t (
  start_date date,
  end_date date,
  partition_date date as (greatest(start_date,end_date)),
  data blob
)
partition by range(partition_date) (
  partition p2000 values less than (date'2001-01-01'),
  partition p2001 values less than (date'2002-01-01'),
  partition p2002 values less than (date'2003-01-01'),
  partition p2003 values less than (date'2004-01-01'),
  partition p2004 values less than (date'2005-01-01')
);

create or replace view v as
select *
from t
where partition_date >= start_date and partition_date >= end_date;
Run Code Online (Sandbox Code Playgroud)