根据另一列重置运行总计

Pரத*_*ீப் 10 sql-server t-sql sql-server-2012 running-totals

我正在尝试计算运行总数。但是当累计总和大于另一列值时它应该重置

create table #reset_runn_total
(
id int identity(1,1),
val int, 
reset_val int,
grp int
)

insert into #reset_runn_total
values 
(1,10,1),
(8,12,1),(6,14,1),(5,10,1),(6,13,1),(3,11,1),(9,8,1),(10,12,1)


SELECT Row_number()OVER(partition BY grp ORDER BY id)AS rn,*
INTO   #test
FROM   #reset_runn_total
Run Code Online (Sandbox Code Playgroud)

指数详情:

CREATE UNIQUE CLUSTERED INDEX ix_load_reset_runn_total
  ON #test(rn, grp) 
Run Code Online (Sandbox Code Playgroud)

样本数据

+----+-----+-----------+-----+
| id | val | reset_val | Grp |
+----+-----+-----------+-----+
|  1 |   1 |        10 | 1   |
|  2 |   8 |        12 | 1   |
|  3 |   6 |        14 | 1   |
|  4 |   5 |        10 | 1   |
|  5 |   6 |        13 | 1   |
|  6 |   3 |        11 | 1   |
|  7 |   9 |         8 | 1   |
|  8 |  10 |        12 | 1   |
+----+-----+-----------+-----+ 
Run Code Online (Sandbox Code Playgroud)

预期结果

+----+-----+-----------------+-------------+
| id | val |    reset_val    | Running_tot |
+----+-----+-----------------+-------------+
|  1 |   1 | 10              |       1     |  
|  2 |   8 | 12              |       9     |  --1+8
|  3 |   6 | 14              |       15    |  --1+8+6 -- greater than reset val
|  4 |   5 | 10              |       5     |  --reset 
|  5 |   6 | 13              |       11    |  --5+6
|  6 |   3 | 11              |       14    |  --5+6+3 -- greater than reset val
|  7 |   9 | 8               |       9     |  --reset -- greater than reset val 
|  8 |  10 | 12              |      10     |  --reset
+----+-----+-----------------+-------------+
Run Code Online (Sandbox Code Playgroud)

询问:

我使用Recursive CTE. 原始问题在这里/sf/ask/2945978311/

;WITH cte
     AS (SELECT rn,id,
                val,
                reset_val,
                grp,
                val                   AS running_total,
                Iif (val > reset_val, 1, 0) AS flag
         FROM   #test
         WHERE  rn = 1
         UNION ALL
         SELECT r.*,
                Iif(c.flag = 1, r.val, c.running_total + r.val),
                Iif(Iif(c.flag = 1, r.val, c.running_total + r.val) > r.reset_val, 1, 0)
         FROM   cte c
                JOIN #test r
                  ON r.grp = c.grp
                     AND r.rn = c.rn + 1)
SELECT *
FROM   cte 
Run Code Online (Sandbox Code Playgroud)

T-SQL不使用CLR. 的情况下有没有更好的选择?

Joe*_*ish 6

我看过类似的问题,但从来没有找到一个窗口函数解决方案,它可以对数据进行一次传递。我不认为这是可能的。窗口函数需要能够应用于列中的所有值。这使得像这样的重置计算非常困难,因为一次重置会更改以下所有值的值。

考虑这个问题的一种方法是,如果您计算基本运行总计,只要您可以从正确的前一行中减去运行总计,就可以获得您想要的最终结果。例如,在您的示例数据中,id4的值是running total of row 4 - the running total of row 3. id6的值是running total of row 6 - the running total of row 3因为尚未发生重置。id7的值是running total of row 7 - the running total of row 6等等。

我会在循环中使用 T-SQL 来解决这个问题。我有点不知所措,认为我有一个完整的解决方案。对于 300 万行和 500 个组,代码在我的桌面上在 24 秒内完成。我正在使用具有 6 个 vCPU 的 SQL Server 2016 开发人员版进行测试。我通常利用并行插入和并行执行,因此如果您使用的是旧版本或有 DOP 限制,您可能需要更改代码。

在我用来生成数据的代码下方。在范围VALRESET_VAL应类似于您的样本数据。

drop table if exists reset_runn_total;

create table reset_runn_total
(
id int identity(1,1),
val int, 
reset_val int,
grp int
);

DECLARE 
@group_num INT,
@row_num INT;
BEGIN
    SET NOCOUNT ON;
    BEGIN TRANSACTION;

    SET @group_num = 1;
    WHILE @group_num <= 50000 
    BEGIN
        SET @row_num = 1;
        WHILE @row_num <= 60
        BEGIN
            INSERT INTO reset_runn_total WITH (TABLOCK)
            SELECT 1 + ABS(CHECKSUM(NewId())) % 10, 8 + ABS(CHECKSUM(NewId())) % 8, @group_num;

            SET @row_num = @row_num + 1;
        END;
        SET @group_num = @group_num + 1;
    END;
    COMMIT TRANSACTION;
END;
Run Code Online (Sandbox Code Playgroud)

算法如下:

1)首先将具有标准运行总计的所有行插入到临时表中。

2)在一个循环中:

2a) 对于每组,计算第一行的运行总计高于表中剩余的 reset_value,并将 id、过大的运行总计和过大的前一次运行总计存储在临时表中。

2b) 将第一个临时表中的行删除到结果临时表中,这些行ID小于或等于ID第二个临时表中的 。根据需要使用其他列调整运行总计。

3) 删除后不再处理的行将额外运行DELETE OUTPUT到结果表中。这是针对永远不会超过重置值的组末尾的行。

我将逐步在 T-SQL 中完成上述算法的一种实现。

首先创建几个临时表。#initial_results使用标准运行总计保存原始数据,在#group_bookkeeping每个循环中更新以找出可以移动的行,并#final_results包含针对重置调整运行总计的结果。

CREATE TABLE #initial_results (
id int,
val int, 
reset_val int,
grp int,
initial_running_total int
);

CREATE TABLE #group_bookkeeping (
grp int,
max_id_to_move int,
running_total_to_subtract_this_loop int,
running_total_to_subtract_next_loop int,
grp_done bit, 
PRIMARY KEY (grp)
);

CREATE TABLE #final_results (
id int,
val int, 
reset_val int,
grp int,
running_total int
);

INSERT INTO #initial_results WITH (TABLOCK)
SELECT ID, VAL, RESET_VAL, GRP, SUM(VAL) OVER (PARTITION BY GRP ORDER BY ID) RUNNING_TOTAL
FROM reset_runn_total;

CREATE CLUSTERED INDEX i1 ON #initial_results (grp, id);

INSERT INTO #group_bookkeeping WITH (TABLOCK)
SELECT DISTINCT GRP, 0, 0, 0, 0
FROM reset_runn_total;
Run Code Online (Sandbox Code Playgroud)

我在临时表上创建聚集索引之后,插入和索引构建可以并行完成。在我的机器上有很大的不同,但在你的机器上可能没有。在源表上创建索引似乎没有帮助,但这可能对您的机器有所帮助。

下面的代码在循环中运行并更新簿记表。对于每个组,我们需要找到ID应该移动到结果表中的最大值。我们需要该行的运行总计,以便我们可以从初始运行总计中减去它。grp_done当没有更多工作要做时,列设置为 1 grp

WITH UPD_CTE AS (
        SELECT 
        #grp_bookkeeping.GRP
        , MIN(CASE WHEN initial_running_total - #group_bookkeeping.running_total_to_subtract_next_loop > RESET_VAL THEN ID ELSE NULL END) max_id_to_update
        , MIN(#group_bookkeeping.running_total_to_subtract_next_loop) running_total_to_subtract_this_loop
        , MIN(CASE WHEN initial_running_total - #group_bookkeeping.running_total_to_subtract_next_loop > RESET_VAL THEN initial_running_total ELSE NULL END) additional_value_next_loop
        , CASE WHEN MIN(CASE WHEN initial_running_total - #group_bookkeeping.running_total_to_subtract_next_loop > RESET_VAL THEN ID ELSE NULL END) IS NULL THEN 1 ELSE 0 END grp_done
        FROM #group_bookkeeping 
        INNER JOIN #initial_results IR ON #group_bookkeeping.grp = ir.grp
        WHERE #group_bookkeeping.grp_done = 0
        GROUP BY #group_bookkeeping.GRP
    )
    UPDATE #group_bookkeeping
    SET #group_bookkeeping.max_id_to_move = uv.max_id_to_update
    , #group_bookkeeping.running_total_to_subtract_this_loop = uv.running_total_to_subtract_this_loop
    , #group_bookkeeping.running_total_to_subtract_next_loop = uv.additional_value_next_loop
    , #group_bookkeeping.grp_done = uv.grp_done
    FROM UPD_CTE uv
    WHERE uv.GRP = #group_bookkeeping.grp
OPTION (LOOP JOIN);
Run Code Online (Sandbox Code Playgroud)

真的不是LOOP JOIN一般的提示的粉丝,但这是一个简单的查询,它是获得我想要的东西的最快方法。为了真正优化响应时间,我想要并行嵌套循环连接而不是 DOP 1 合并连接。

下面的代码在循环中运行并将数据从初始表移动到最终结果表中。请注意对初始运行总计的调整。

DELETE ir
OUTPUT DELETED.id,  
    DELETED.VAL,  
    DELETED.RESET_VAL,  
    DELETED.GRP ,
    DELETED.initial_running_total - tb.running_total_to_subtract_this_loop
INTO #final_results
FROM #initial_results ir
INNER JOIN #group_bookkeeping tb ON ir.GRP = tb.GRP AND ir.ID <= tb.max_id_to_move
WHERE tb.grp_done = 0;
Run Code Online (Sandbox Code Playgroud)

为方便起见,以下是完整代码:

DECLARE @RC INT;
BEGIN
SET NOCOUNT ON;

CREATE TABLE #initial_results (
id int,
val int, 
reset_val int,
grp int,
initial_running_total int
);

CREATE TABLE #group_bookkeeping (
grp int,
max_id_to_move int,
running_total_to_subtract_this_loop int,
running_total_to_subtract_next_loop int,
grp_done bit, 
PRIMARY KEY (grp)
);

CREATE TABLE #final_results (
id int,
val int, 
reset_val int,
grp int,
running_total int
);

INSERT INTO #initial_results WITH (TABLOCK)
SELECT ID, VAL, RESET_VAL, GRP, SUM(VAL) OVER (PARTITION BY GRP ORDER BY ID) RUNNING_TOTAL
FROM reset_runn_total;

CREATE CLUSTERED INDEX i1 ON #initial_results (grp, id);

INSERT INTO #group_bookkeeping WITH (TABLOCK)
SELECT DISTINCT GRP, 0, 0, 0, 0
FROM reset_runn_total;

SET @RC = 1;
WHILE @RC > 0 
BEGIN
    WITH UPD_CTE AS (
        SELECT 
        #group_bookkeeping.GRP
        , MIN(CASE WHEN initial_running_total - #group_bookkeeping.running_total_to_subtract_next_loop > RESET_VAL THEN ID ELSE NULL END) max_id_to_move
        , MIN(#group_bookkeeping.running_total_to_subtract_next_loop) running_total_to_subtract_this_loop
        , MIN(CASE WHEN initial_running_total - #group_bookkeeping.running_total_to_subtract_next_loop > RESET_VAL THEN initial_running_total ELSE NULL END) additional_value_next_loop
        , CASE WHEN MIN(CASE WHEN initial_running_total - #group_bookkeeping.running_total_to_subtract_next_loop > RESET_VAL THEN ID ELSE NULL END) IS NULL THEN 1 ELSE 0 END grp_done
        FROM #group_bookkeeping 
        CROSS APPLY (SELECT ID, RESET_VAL, initial_running_total FROM #initial_results ir WHERE #group_bookkeeping.grp = ir.grp ) ir
        WHERE #group_bookkeeping.grp_done = 0
        GROUP BY #group_bookkeeping.GRP
    )
    UPDATE #group_bookkeeping
    SET #group_bookkeeping.max_id_to_move = uv.max_id_to_move
    , #group_bookkeeping.running_total_to_subtract_this_loop = uv.running_total_to_subtract_this_loop
    , #group_bookkeeping.running_total_to_subtract_next_loop = uv.additional_value_next_loop
    , #group_bookkeeping.grp_done = uv.grp_done
    FROM UPD_CTE uv
    WHERE uv.GRP = #group_bookkeeping.grp
    OPTION (LOOP JOIN);

    DELETE ir
    OUTPUT DELETED.id,  
        DELETED.VAL,  
        DELETED.RESET_VAL,  
        DELETED.GRP ,
        DELETED.initial_running_total - tb.running_total_to_subtract_this_loop
    INTO #final_results
    FROM #initial_results ir
    INNER JOIN #group_bookkeeping tb ON ir.GRP = tb.GRP AND ir.ID <= tb.max_id_to_move
    WHERE tb.grp_done = 0;

    SET @RC = @@ROWCOUNT;
END;

DELETE ir 
OUTPUT DELETED.id,  
    DELETED.VAL,  
    DELETED.RESET_VAL,  
    DELETED.GRP ,
    DELETED.initial_running_total - tb.running_total_to_subtract_this_loop
    INTO #final_results
FROM #initial_results ir
INNER JOIN #group_bookkeeping tb ON ir.GRP = tb.GRP;

CREATE CLUSTERED INDEX f1 ON #final_results (grp, id);

/* -- do something with the data
SELECT *
FROM #final_results
ORDER BY grp, id;
*/

DROP TABLE #final_results;
DROP TABLE #initial_results;
DROP TABLE #group_bookkeeping;

END;
Run Code Online (Sandbox Code Playgroud)