使用子查询加入消除不在Oracle中的工作

Ron*_*nis 9 sql database oracle cost-based-optimizer anchor-modeling

我能够将连接消除工作用于简单的情况,例如一对一的关系,但不能用于稍微复杂的场景.最后我想尝试锚建模,但首先我需要找到解决这个问题的方法.我正在使用Oracle 12c企业版第12.1.0.2.0版.

我的测试用例的DDL:

drop view product_5nf;
drop table product_color cascade constraints;
drop table product_price cascade constraints;
drop table product       cascade constraints;

create table product(
   product_id number not null
  ,constraint product_pk primary key(product_id)
);

create table product_color(
   product_id  number         not null references product
  ,color       varchar2(10)   not null
  ,constraint product_color_pk primary key(product_id)
);

create table product_price(
   product_id  number   not null references product
  ,from_date   date     not null
  ,price       number   not null
  ,constraint product_price_pk primary key(product_id, from_date)
);
Run Code Online (Sandbox Code Playgroud)

一些示例数据:

insert into product values(1);
insert into product values(2);
insert into product values(3);
insert into product values(4);

insert into product_color values(1, 'Red');
insert into product_color values(2, 'Green');

insert into product_price values(1, date '2016-01-01', 10);
insert into product_price values(1, date '2016-02-01', 8);
insert into product_price values(1, date '2016-05-01', 5);

insert into product_price values(2, date '2016-02-01', 5);

insert into product_price values(4, date '2016-01-01', 10);

commit;
Run Code Online (Sandbox Code Playgroud)

5NF视图

第一个视图不编译 - 它与ORA-01799失败:列可能不是外部连接到子查询.不幸的是,当我查看锚建模的在线示例时,这就是大多数历史视图的定义...

create view product_5nf as
   select p.product_id
         ,pc.color
         ,pp.price 
     from product p
     left join product_color pc on(
          pc.product_id = p.product_id
     )
     left join product_price pp on(
          pp.product_id = p.product_id
      and pp.from_date  = (select max(pp2.from_date) 
                             from product_price pp2 
                            where pp2.product_id = pp.product_id)
     );
Run Code Online (Sandbox Code Playgroud)

以下是我修复它的尝试.通过简单的选择使用此视图时product_id,Oracle设法消除product_color而不是 product_price.

create view product_5nf as
   select product_id
         ,pc.color
         ,pp.price 
     from product p
     left join product_color pc using(product_id)
     left join (select pp1.product_id, pp1.price 
                  from product_price pp1
                 where pp1.from_date  = (select max(pp2.from_date) 
                                           from product_price pp2 
                                          where pp2.product_id = pp1.product_id)
              )pp using(product_id);

select product_id
  from product_5nf;

----------------------------------------------------------
| Id  | Operation             | Name             | Rows  |
----------------------------------------------------------
|   0 | SELECT STATEMENT      |                  |     4 |
|*  1 |  HASH JOIN OUTER      |                  |     4 |
|   2 |   INDEX FAST FULL SCAN| PRODUCT_PK       |     4 |
|   3 |   VIEW                |                  |     3 |
|   4 |    NESTED LOOPS       |                  |     3 |
|   5 |     VIEW              | VW_SQ_1          |     5 |
|   6 |      HASH GROUP BY    |                  |     5 |
|   7 |       INDEX FULL SCAN | PRODUCT_PRICE_PK |     5 |
|*  8 |     INDEX UNIQUE SCAN | PRODUCT_PRICE_PK |     1 |
----------------------------------------------------------
Run Code Online (Sandbox Code Playgroud)

我发现的唯一解决方案是使用标量子查询,如下所示:

create or replace view product_5nf as
   select p.product_id
         ,pc.color
         ,(select pp.price
             from product_price pp
            where pp.product_id = p.product_id
              and pp.from_date = (select max(from_date)
                                    from product_price pp2
                                   where pp2.product_id = pp.product_id)) as price
     from product p
     left join product_color pc on(
          pc.product_id = p.product_id
     )

select product_id
  from product_5nf;

---------------------------------------------------
| Id  | Operation            | Name       | Rows  |
---------------------------------------------------
|   0 | SELECT STATEMENT     |            |     4 |
|   1 |  INDEX FAST FULL SCAN| PRODUCT_PK |     4 |
---------------------------------------------------
Run Code Online (Sandbox Code Playgroud)

现在Oracle成功地删除了product_price表.但是,标量子查询的实现方式与连接不同,执行它们的方式根本不允许我在现实场景中获得任何可接受的性能.

TL; DR 如何重写视图product_5nf以便Oracle成功地消除了两个依赖表?

Mat*_*eak 4

我认为你这里有两个问题。

首先,连接消除仅适用于某些特定情况(PK-PK、PK-FK 等)。LEFT JOIN对于任何将为每个连接键值返回单行并让 Oracle 消除连接的行集,这并不是一件常见的事情。

其次,即使 Oracle 足够先进,可以在 ANY 上进行连接消除LEFT JOIN,它知道每个连接键值只能获得一行,Oracle 还不支持LEFT JOINS基于组合键的连接消除(Oracle 支持文档 887553.1 中提到了这一点)将在 R12.2 中推出)。

您可以考虑的一种解决方法是具体化每个视图的最后一行product_id。然后LEFT JOIN到物化视图。像这样:

create table product(
   product_id number not null
  ,constraint product_pk primary key(product_id)
);

create table product_color(
   product_id  number         not null references product
  ,color       varchar2(10)   not null
  ,constraint product_color_pk primary key(product_id)
);

create table product_price(
   product_id  number   not null references product
  ,from_date   date     not null
  ,price       number   not null
  ,constraint product_price_pk  primary key (product_id, from_date )
);

-- Add a VIRTUAL column to PRODUCT_PRICE so that we can get all the data for 
-- the latest row by taking the MAX() of this column.
alter table product_price add ( sortable_row varchar2(80) generated always as ( lpad(product_id,10,'0') || to_char(from_date,'YYYYMMDDHH24MISS') || lpad(price,10,'0'))  virtual not null );

-- Create a MV snapshot so we can materialize a view having only the latest
-- row for each product_id and can refresh that MV fast on commit.
create materialized view log on product_price with sequence, primary key, rowid ( price  ) including new values;

-- Create the MV
create materialized view product_price_latest refresh fast on commit enable query rewrite as
SELECT product_id, max( lpad(product_id,10,'0') || to_char(from_date,'YYYYMMDDHH24MISS') || lpad(price,10,'0')) sortable_row
FROM   product_price
GROUP BY product_id;

-- Create a primary key on the MV, so we can do join elimination
alter table product_price_latest add constraint ppl_pk primary key ( product_id );

-- Insert the OP's test data
insert into product values(1);
insert into product values(2);
insert into product values(3);
insert into product values(4);

insert into product_color values(1, 'Red');
insert into product_color values(2, 'Green');

insert into product_price ( product_id, from_date, price ) values(1, date '2016-01-01', 10 );
insert into product_price ( product_id, from_date, price) values(1, date '2016-02-01', 8);
insert into product_price ( product_id, from_date, price) values(1, date '2016-05-01', 5);

insert into product_price ( product_id, from_date, price) values(2, date '2016-02-01', 5);

insert into product_price ( product_id, from_date, price) values(4, date '2016-01-01', 10);

commit;

-- Create the 5NF view using the materialized view
create or replace view product_5nf as
   select p.product_id
         ,pc.color
         ,to_date(substr(ppl.sortable_row,11,14),'YYYYMMDDHH24MISS') from_date
         ,to_number(substr(ppl.sortable_row,25)) price 
     from product p
     left join product_color pc on pc.product_id = p.product_id
     left join product_price_latest ppl on ppl.product_id = p.product_id 
;

-- The plan for this should not include any of the unnecessary tables.
select product_id from product_5nf;

-- Check the plan
SELECT *
FROM   TABLE (DBMS_XPLAN.display_cursor (null, null,
                                         'ALLSTATS LAST'));

------------------------------------------------
| Id  | Operation        | Name       | E-Rows |
------------------------------------------------
|   0 | SELECT STATEMENT |            |        |
|   1 |  INDEX FULL SCAN | PRODUCT_PK |      1 |
------------------------------------------------
Run Code Online (Sandbox Code Playgroud)