如何密集排名数据集

Cul*_*ton 5 sql oracle rank dense-rank

我试图得到一个密集的排名来组合数据集.在我的表中,我有ID,GRP_SET,SUB_SET和INTERVAL,它们只代表一个日期字段.当使用ID插入记录时,它们将被插入为3行的GRP_SET,显示为SUB_SET.正如您所看到的,当插入发生时,间隔可以在完成插入集之前稍微改变.

这是一些示例数据,DRANK列表示我想要获得的排名.

with q as (
select 1 id, 'a' GRP_SET, 1 as SUB_SET, 123 as interval, 1 as DRANK from dual union all
select 1, 'a', 2, 123, 1 from dual union all
select 1, 'a', 3, 124, 1 from dual union all
select 1, 'b', 1, 234, 2 from dual union all
select 1, 'b', 2, 235, 2 from dual union all
select 1, 'b', 3, 235, 2 from dual union all
select 1, 'a', 1, 331, 3 from dual union all
select 1, 'a', 2, 331, 3 from dual union all
select 1, 'a', 3, 331, 3 from dual)

select * from q
Run Code Online (Sandbox Code Playgroud)

示例数据

ID GRP_SET SUBSET INTERVAL DRANK
1  a       1      123      1
1  a       2      123      1
1  a       3      124      1
1  b       1      234      2
1  b       3      235      2
1  b       2      235      2
1  a       1      331      3
1  a       2      331      3
1  a       3      331      3
Run Code Online (Sandbox Code Playgroud)

这是我有的查询,但我似乎需要像:

  • 分区依据: ID
  • 分区内的顺序: ID,Interval
  • 在以下时间更改排名: ID,GRP_SET(更改)

select
   id, GRP_SET, SUB_SET, interval,
   DENSE_RANK() over (partition by ID order by id, GRP_SET) as DRANK_TEST
from q
Order by
   id, interval
Run Code Online (Sandbox Code Playgroud)

Luk*_*der 2

使用MODEL子句

看哪,您的需求超出了“普通”SQL 可以轻松表达的范围。但幸运的是,您正在使用 Oracle,它具有该MODEL条款,该设备的神秘之处仅在于其功能(这里有出色的白皮书)。你应该写:

SELECT
   id, grp_set, sub_set, interval, drank
FROM (
  SELECT id, grp_set, sub_set, interval, 1 drank
  FROM q
)
MODEL PARTITION BY (id)
      DIMENSION BY (row_number() OVER (ORDER BY interval, sub_set) rn)
      MEASURES (grp_set, sub_set, interval, drank)
      RULES (
        drank[any] = NVL(drank[cv(rn) - 1] + 
                         DECODE(grp_set[cv(rn) - 1], grp_set[cv(rn)], 0, 1), 1)
      )
Run Code Online (Sandbox Code Playgroud)

SQLFiddle 上的证明

解释:

SELECT
   id, grp_set, sub_set, interval, drank
FROM (
  -- Here, we initialise your "dense rank" to 1
  SELECT id, grp_set, sub_set, interval, 1 drank
  FROM q
)

-- Then we partition the data set by ID (that's your requirement)
MODEL PARTITION BY (id)

-- We generate row numbers for all columns ordered by interval and sub_set,
-- such that we can then access row numbers in that particular order
      DIMENSION BY (row_number() OVER (ORDER BY interval, sub_set) rn)

-- These are the columns that we want to generate from the MODEL clause
      MEASURES (grp_set, sub_set, interval, drank)

-- And the rules are simple: Each "dense rank" value is equal to the
-- previous "dense rank" value + 1, if the grp_set value has changed
      RULES (
        drank[any] = NVL(drank[cv(rn) - 1] + 
                         DECODE(grp_set[cv(rn) - 1], grp_set[cv(rn)], 0, 1), 1)
      )
Run Code Online (Sandbox Code Playgroud)

当然,这仅在没有交错事件的情况下才有效,即除了grp_set123a和 124 之间没有其他事件