带复位值的累计和

jpm*_*c26 5 oracle oracle-11g-r2

考虑下表:

ID | GROUP_ID | ORDER_VAL | RESET_VAL | VAL 
---+----------+-----------+-----------+-----
1  | 1        | 1         | (null)    | 3   
2  | 1        | 2         | (null)    | 2   
3  | 1        | 3         | (null)    | 1   
4  | 1        | 4         | 4         | 2   
5  | 1        | 5         | (null)    | 1   
6  | 2        | 1         | (null)    | 4   
7  | 2        | 2         | 2         | 3   
8  | 2        | 3         | (null)    | 4   
9  | 2        | 4         | (null)    | 2   
10 | 2        | 5         | (null)    | 2   
11 | 2        | 6         | (null)    | 4   
12 | 2        | 7         | 14        | 2   
13 | 2        | 8         | (null)    | 2   
Run Code Online (Sandbox Code Playgroud)

对于每一行,我需要计算VAL所有先前行的累积总和(按排序ORDER_VAL和分组GROUP_ID),但每次NULL RESET_VAL遇到非时,我需要使用该值作为总和。后面的行也需要建立在 之上,RESET_VAL而不是使用实际总和。请注意,每个组可以有多个重置值。

这是我对上表期望的结果:

ID | GROUP_ID | ORDER_VAL | RESET_VAL | VAL | CUMSUM
---+----------+-----------+-----------+-----+-------
1  | 1        | 1         | (null)    | 3   | 0
2  | 1        | 2         | (null)    | 2   | 3
3  | 1        | 3         | (null)    | 1   | 5
4  | 1        | 4         | 4         | 2   | 4
5  | 1        | 5         | (null)    | 1   | 6
6  | 2        | 1         | (null)    | 4   | 0
7  | 2        | 2         | 2         | 3   | 2
8  | 2        | 3         | (null)    | 4   | 5
9  | 2        | 4         | (null)    | 2   | 9
10 | 2        | 5         | (null)    | 2   | 11
11 | 2        | 6         | (null)    | 4   | 13
12 | 2        | 7         | 14        | 2   | 14
13 | 2        | 8         | (null)    | 2   | 16
Run Code Online (Sandbox Code Playgroud)

如果不是重置值,我可以使用窗口查询:

SELECT temp.*,
       COALESCE(SUM(val) OVER (PARTITION BY group_id ORDER BY order_val ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING),
                0) AS cumsum
FROM temp;
Run Code Online (Sandbox Code Playgroud)

上面的SQLFiddle

我最初错误地认为我可以放在RESET_VAL的开头COALESCE,但这不起作用,因为它不会重置后续行的值。

我也尝试过这个解决方案,但它只会重置为零,而不是列中的值。调整它这样做被证明是非常重要的,因为该值必须传播到所有后续行。

递归查询似乎很自然,但我还没有弄清楚如何做到这一点。

我可能应该提一下,我实际要处理的表比上面的例子大得多(几十万到几百万行),所以如果有任何答案,请提及是否有任何性能缺陷。

ype*_*eᵀᴹ 4

以下可行,但可能有一些更聪明的版本。查询逻辑说明:

我们首先通过计算列的非空值来查找到当前行(包括当前行)已经完成了多少次“重置” reset_val,这样我们就可以将行分成子组。

我们还使用了另一个窗口函数,LAST_VALUE()因此IGNORE NULLS我们可以找到最后一个reset_value

请注意,这两个窗口函数COUNT()LAST_VALUE()都有一个ORDER BY,因此是默认窗口ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW。查询中省略,让代码更清晰。

假设val不可为空,其他窗口函数也可以缩短,如下所示:

       COALESCE(SUM(val) OVER 
         (PARTITION BY group_id, reset_count 
          ORDER BY order_val 
          ROWS BETWEEN UNBOUNDED PRECEDING 
                   AND 1 PRECEDING), 0)   
Run Code Online (Sandbox Code Playgroud)

COALESCE()也避免):

       SUM(val) OVER 
         (PARTITION BY group_id, reset_count 
          ORDER BY order_val)
       - val
Run Code Online (Sandbox Code Playgroud)

最后,在第二个 cte 中,我们使用上面找到的子组(使用PARTITION BY group_id, reset_count)来查找累积和。

WITH x AS
  ( SELECT temp.*, 
           COUNT(reset_val) OVER 
               (PARTITION BY group_id 
                ORDER BY order_val)
             AS reset_count,
           COALESCE(LAST_VALUE(reset_val IGNORE NULLS) OVER 
               (PARTITION BY group_id 
                ORDER BY order_val), 0)
             AS reset_value
    FROM temp
  ) ,
y AS 
  ( SELECT x.*,
           COALESCE(SUM(val) OVER 
             (PARTITION BY group_id, reset_count 
              ORDER BY order_val 
              ROWS BETWEEN UNBOUNDED PRECEDING 
                       AND 1 PRECEDING), 0)            
           + reset_value AS cumsum      
    FROM x
  )
SELECT *
FROM y ;
Run Code Online (Sandbox Code Playgroud)

SQLfiddle进行测试。


另一种变体,基于@Chris 的递归答案。(略有改进,与非连续一起工作order_val,避免了最后GROUP BY)。
如果组的第一行有reset_val

WITH row_nums AS
  ( SELECT id, group_id, order_val, reset_val, val, 
           ROW_NUMBER() OVER (PARTITION BY group_id
                              ORDER BY order_val)
             AS rn
    FROM temp
  ) ,
updated_temp (id, group_id, order_val, reset_val, val, rn, cumsum) AS
  ( SELECT id, group_id, order_val, reset_val, val, rn, 
           COALESCE(reset_val, 0)
    FROM row_nums
    WHERE rn = 1
  UNION ALL
    SELECT curr.id, curr.group_id, curr.order_val, curr.reset_val, curr.val, curr.rn, 
           COALESCE(curr.reset_val, prev.val + prev.cumsum) 
    FROM row_nums  curr 
      JOIN updated_temp  prev 
        ON  curr.rn-1 = prev.rn 
        AND curr.group_id = prev.group_id
  )
SELECT id, group_id, order_val, reset_val, val, cumsum
FROM updated_temp
ORDER BY group_id, order_val ;
Run Code Online (Sandbox Code Playgroud)

在SQLfiddle-2上进行测试。


另一种变体是使用旧的(专有)CONNECT BY语法进行递归查询。更紧凑,但我发现它比 CTE 版本更难编写和阅读:

WITH row_nums AS
  ( SELECT id, group_id, order_val, reset_val, val, 
           ROW_NUMBER() OVER (PARTITION BY group_id
                              ORDER BY order_val)
             AS rn,
           COALESCE(reset_val, 0) AS cumsum
    FROM temp
  ) 
SELECT id, group_id, order_val, reset_val, val, rn,  
       COALESCE(reset_val, PRIOR val + PRIOR cumsum, 0) AS cumsum
FROM row_nums
START WITH rn = 1 OR reset_val IS NOT NULL
CONNECT BY  rn-1 = PRIOR rn 
        AND group_id = PRIOR group_id
        AND reset_val IS NULL 
ORDER BY group_id, order_val ; 
Run Code Online (Sandbox Code Playgroud)

在SQLfiddle-3上测试。