什么是“分区外连接”?

Eva*_*oll 6 postgresql oracle join terminology

这只是在 Reddit 上的一个问题中提出的,我想知道

  • PARTITIONED OUTER JOIN甲骨文中的a是什么?(定义)
  • 一个简单的例子是什么样的?(用)
  • 你会如何用 PostgreSQL 或标准 SQL 来编写它,否则它会缺乏PARTITIONED OUTER JOIN?(等价)

ste*_*fan 8

{1} 分区外连接:定义

...“这种连接通过将外连接应用于查询中定义的每个逻辑分区来扩展传统的外连接语法。Oracle 根据您在 PARTITION BY 子句中指定的表达式对查询中的行进行逻辑分区。结果分区外连接是逻辑分区表中每个分区的外连接与连接另一侧的表的联合。” (文档)

{2} 简单示例

“数据通常以稀疏形式存储。也就是说,如果给定的维度值组合不存在任何值,则事实表中不存在任何行。但是,您可能希望以密集形式查看数据,所有组合的行即使不存在事实数据,也会显示维度值的数量。”

...“例如,如果产品在特定时间段内没有销售,您可能仍希望在该时间段内看到该产品旁边的销售额为零。” (引用自文档

测试表和数据 (INSERT)

-- Oracle 12c
create table sales (
  date_ date
, location_ varchar2( 16 )
, qty_ number
);

create table locations (
  name varchar2( 16 )
);

-- dates for locations are "gappy": 
-- none of the locations has entries for all 3 dates
-- ( date range: 2019-01-15 - 2019-01-17 )
insert into sales ( date_, location_, qty_ ) 
  values ( date '2019-01-17', 'London', 11 ) ;
insert into sales ( date_, location_, qty_ ) 
  values ( date '2019-01-15', 'London', 10 ) ;
insert into sales ( date_, location_, qty_ ) 
  values ( date '2019-01-16', 'Paris', 20 ) ;
insert into sales ( date_, location_, qty_ ) 
  values ( date '2019-01-17', 'Boston', 31 ) ;
insert into sales ( date_, location_, qty_ ) 
  values ( date '2019-01-16', 'Boston', 30 ) ;

-- locations
insert into locations ( name ) values ( 'London' );
insert into locations ( name ) values ( 'Paris' );
insert into locations ( name ) values ( 'Boston' );
Run Code Online (Sandbox Code Playgroud)

所需的输出

date_       location_  qty_
2019-01-15  London     10
2019-01-15  Paris       0    -- not INSERTed!
2019-01-15  Boston      0    -- not INSERTed!
2019-01-16  London      0    -- not INSERTed!
2019-01-16  Paris      20
2019-01-16  Boston     30
2019-01-17  London     11
2019-01-17  Paris       0    -- not INSERTed!
2019-01-17  Boston     31
Run Code Online (Sandbox Code Playgroud)

查询(分区外连接)

select S.date_, S.qty_, L.name
from sales S partition by ( date_ ) 
  right join locations L on S.location_ = L.name
;

-- result
DATE_      QTY_  NAME    
15-JAN-19  NULL  Boston  
15-JAN-19  10    London  
15-JAN-19  NULL  Paris   
16-JAN-19  30    Boston  
16-JAN-19  NULL  London  
16-JAN-19  20    Paris   
17-JAN-19  31    Boston  
17-JAN-19  11    London  
17-JAN-19  NULL  Paris 
Run Code Online (Sandbox Code Playgroud)

查询(版本 2,相同的连接)

-- same as above, using NVL(), column aliases, and ORDER BY ...
select S.date_, nvl( S.qty_, 0 ) as sold, L.name as location
from sales S partition by ( date_ ) 
  right join locations L on S.location_ = L.name
order by S.date_, L.name
;

DATE_           SOLD LOCATION        
--------- ---------- ----------------
15-JAN-19          0 Boston          
15-JAN-19         10 London          
15-JAN-19          0 Paris           
16-JAN-19         30 Boston          
16-JAN-19          0 London          
16-JAN-19         20 Paris           
17-JAN-19         31 Boston          
17-JAN-19         11 London          
17-JAN-19          0 Paris           
Run Code Online (Sandbox Code Playgroud)

Dbfiddle在这里(Oracle 18c)

{3} 等价

以下查询与 PARTITION BY ( date_ ) 外连接执行大致相同的工作。我们正在使用 CROSS JOIN(内部 SELECT)和 LEFT OUTER JOIN 的组合。(省略了 NULL 到 0 的转换)

甲骨文

select SL.*, S.qty_
from
(
  select *
  from (
    select unique date_ from sales
  ) , (
    select unique name from locations
  )
) SL left join (
  select date_, location_, qty_ from sales
) S on SL.name = S.location_ and SL.date_ = S.date_ 
order by SL.date_, SL.name
;

DATE_      NAME    QTY_  
15-JAN-19  Boston  NULL  
15-JAN-19  London  10    
15-JAN-19  Paris   NULL  
16-JAN-19  Boston  30    
16-JAN-19  London  NULL  
16-JAN-19  Paris   20    
17-JAN-19  Boston  31    
17-JAN-19  London  11    
17-JAN-19  Paris   NULL 
Run Code Online (Sandbox Code Playgroud)

PostgreSQL 10 ( dbfiddle )

-- DDL and INSERTs
create table sales (
  date_ date
, location_ varchar( 16 )
, qty_ number
);

create table locations (
  name varchar( 16 )
);

insert into sales ( date_, location_, qty_ ) values 
  ( '2019-01-17', 'London', 11 )
, ( '2019-01-15', 'London', 10 )
, ( '2019-01-16', 'Paris', 20 )
, ( '2019-01-17', 'Boston', 31 )
, ( '2019-01-16', 'Boston', 30 )

insert into locations ( name ) values 
( 'London' ), ( 'Paris' ), ( 'Boston' );
Run Code Online (Sandbox Code Playgroud)

查询 (Postgres)

-- SL: all date_ <-> location combinations
-- S: all location_ and qty_ values of table sales
select SL.*, S.qty_
from
(
  select *
  from (
    select distinct date_ from sales
  ) S_ cross join (
    select distinct name from locations
  ) L_
) SL left join (
  select date_, location_, qty_ from sales
) S on SL.name = S.location_ and SL.date_ = S.date_ 
order by SL.date_, SL.name
;

date_        name    qty_
2019-01-15   Boston  
2019-01-15   London  10
2019-01-15   Paris    
2019-01-16   Boston  30
2019-01-16   London    
2019-01-16   Paris   20
2019-01-17   Boston  31
2019-01-17   London  11
2019-01-17   Paris    
Run Code Online (Sandbox Code Playgroud)