优化SQL查询以删除游标

Sco*_*ain 5 sql optimization sql-server-2005

我正在尝试编写一个查询,该查询将通过表格并将帐户中的任何信用额应用于最旧的余额.我不知道如何在不使用游标的情况下做到这一点,我知道如果可能的话应该不惜一切代价避免使用游标,所以我来这里寻求帮助.

select * into #balances from [IDAT_AR_BALANCES] where amount > 0
select * into #credits from [IDAT_AR_BALANCES] where amount < 0

create index ba_ID on #balances (CLIENT_ID)
create index cr_ID on #credits (CLIENT_ID)

declare credit_cursor cursor for
select [CLIENT_ID], amount, cvtGUID from #credits

open credit_cursor
declare @client_id varchar(11)
declare @credit money
declare @balance money
declare @cvtGuidBalance uniqueidentifier
declare @cvtGuidCredit uniqueidentifier
fetch next from credit_cursor into @client_id, @credit, @cvtGuidCredit
while @@fetch_status = 0
begin
      while(@credit < 0 and (select count(*) from #balances where @client_id = CLIENT_ID and amount <> 0) > 0)
      begin
            select top 1  @balance = amount, @cvtGuidBalance = cvtGuid from #balances where @client_id = CLIENT_ID and amount <> 0 order by AGING_DATE
            set @credit = @balance + @credit
            if(@credit > 0)
            begin
                  update #balances set amount = @credit where cvtGuid = @cvtGuidBalance
                  set @credit = 0
            end
            else
            begin
                  update #balances set amount = 0 where cvtGuid = @cvtGuidBalance
            end
      end
      update #credits set amount = @credit where cvtGuid = @cvtGuidCredit
      fetch next from credit_cursor into @client_id, @credit, @cvtGuidCredit
end

close credit_cursor
deallocate credit_cursor

delete #balances where AMOUNT = 0
delete #credits where AMOUNT = 0

truncate table [IDAT_AR_BALANCES]

insert [IDAT_AR_BALANCES] select * from #balances
insert [IDAT_AR_BALANCES] select * from #credits

drop table #balances
drop table #credits
Run Code Online (Sandbox Code Playgroud)

在10000个记录和1000个客户端的测试用例中,运行需要26秒,通过在CLIENT_ID上添加两个索引,我可以将数字降低到14秒.然而,对于我需要的东西,这仍然太慢,最终结果可能有多达10000个客户端和超过4,000,000条记录,因此运行时间很容易变成两位数分钟.

任何有关如何重新构造以移除光标的建议都将非常感激.

示例(更新以显示您在运行后可以获得多个积分):

before
cvtGuid      client_id      ammount     AGING_DATE
xxxxxx       1              20.00       1/1/2011
xxxxxx       1              30.00       1/2/2011
xxxxxx       1              -10.00      1/3/2011
xxxxxx       1              5.00        1/4/2011
xxxxxx       2              20.00       1/1/2011
xxxxxx       2              15.00       1/2/2011
xxxxxx       2              -40.00      1/3/2011
xxxxxx       2              5.00        1/4/2011
xxxxxx       3              10.00       1/1/2011
xxxxxx       3              -20.00      1/2/2011
xxxxxx       3              5.00        1/3/2011
xxxxxx       3              -8.00       1/4/2011

after
cvtGuid      client_id      ammount     AGING_DATE
xxxxxx       1              10.00       1/1/2011
xxxxxx       1              30.00       1/2/2011
xxxxxx       1              5.00        1/4/2011
xxxxxx       3              -5.00       1/2/2011
xxxxxx       3              -8.00       1/4/2011
Run Code Online (Sandbox Code Playgroud)

因此,它会将负面信用额应用于最早的正余额(示例中为客户1),如果在完成后没有剩余的正余额,则会留下剩余的负数(客户3),如果它们完全取消(这是在90%的时间内使用真实数据)它将完全删除记录(客户端2).

And*_*y M 4

可以借助递归 CTE 来解决这个问题。

基本思想是这样的:

  1. 分别获取每个帐户的正值和负值总计 ( client_id)。

  2. 迭代每个账户并根据 的符号和绝对值“截取”两个总计之一的金额amount(即,绝不会“截取”相应总计超过其当前值)。应添加/减去相同的值amount

  3. 更新后,删除那些amount变成0的行。

对于我的解决方案,我借用了 Lieven 的表变量定义(谢谢!),添加一列(cvtGuidint出于演示目的而声明)和一行(原始示例中的最后一行,Lieven 的脚本中缺少) 。

/* preparing the demonstration data */
DECLARE @IDAT_AR_BALANCES TABLE (
  cvtGuid int IDENTITY,
  client_id INTEGER
  , amount FLOAT
  , date DATE
);
INSERT INTO @IDAT_AR_BALANCES
  SELECT 1, 20.00, '1/1/2011'
  UNION ALL SELECT 1, 30.00, '1/2/2011'
  UNION ALL SELECT 1, -10.00, '1/3/2011'
  UNION ALL SELECT 1, 5.00, '1/4/2011'
  UNION ALL SELECT 2, 20.00, '1/1/2011'
  UNION ALL SELECT 2, 15.00, '1/2/2011'
  UNION ALL SELECT 2, -40.00, '1/3/2011'
  UNION ALL SELECT 2, 5.00, '1/4/2011'
  UNION ALL SELECT 3, 10.00, '1/1/2011'
  UNION ALL SELECT 3, -20.00, '1/2/2011'
  UNION ALL SELECT 3, 5.00, '1/3/2011'
  UNION ALL SELECT 3, -8.00, '1/4/2011';

/* checking the original contents */
SELECT * FROM @IDAT_AR_BALANCES;

/* getting on with the job: */
WITH totals AS (
  SELECT
    /* 1) preparing the totals */
    client_id,
    total_pos = SUM(CASE WHEN amount > 0 THEN amount END),
    total_neg = SUM(CASE WHEN amount < 0 THEN amount END)
  FROM @IDAT_AR_BALANCES
  GROUP BY client_id
),
refined AS (
  /* 2) refining the original data with auxiliary columns:
     * rownum - row numbers (unique within accounts);
     * amount_to_discard_pos - the amount to discard `amount` completely if it's negative;
     * amount_to_discard_neg - the amount to discard `amount` completely if it's positive
  */
  SELECT
    *,
    rownum = ROW_NUMBER() OVER (PARTITION BY client_id ORDER BY date),
    amount_to_discard_pos = CAST(CASE WHEN amount < 0 THEN -amount ELSE 0 END AS float),
    amount_to_discard_neg = CAST(CASE WHEN amount > 0 THEN -amount ELSE 0 END AS float)
  FROM @IDAT_AR_BALANCES
),
prepared AS (
  /* 3) preparing the final table (using a recursive CTE) */
  SELECT
    cvtGuid = CAST(NULL AS int),
    client_id,
    amount = CAST(NULL AS float),
    date = CAST(NULL AS date),
    amount_update = CAST(NULL AS float),
    running_balance_pos = total_pos,
    running_balance_neg = total_neg,
    rownum = CAST(0 AS bigint)
  FROM totals
  UNION ALL
  SELECT
    n.cvtGuid,
    n.client_id,
    n.amount,
    n.date,
    amount_update = CAST(
      CASE
        WHEN n.amount_to_discard_pos < p.running_balance_pos
        THEN n.amount_to_discard_pos
        ELSE p.running_balance_pos
      END
      +
      CASE
        WHEN n.amount_to_discard_neg > p.running_balance_neg
        THEN n.amount_to_discard_neg
        ELSE p.running_balance_neg
      END
    AS float),
    running_balance_pos = CAST(p.running_balance_pos -
      CASE
        WHEN n.amount_to_discard_pos < p.running_balance_pos
        THEN n.amount_to_discard_pos
        ELSE p.running_balance_pos
      END
    AS float),
    running_balance_neg = CAST(p.running_balance_neg -
      CASE
        WHEN n.amount_to_discard_neg > p.running_balance_neg
        THEN n.amount_to_discard_neg
        ELSE p.running_balance_neg
      END
    AS float),
    n.rownum
  FROM refined n
    INNER JOIN prepared p ON n.client_id = p.client_id AND n.rownum = p.rownum + 1
)
/*                  -- some junk that I've forgotten to clean up,
SELECT *            -- which you might actually want to use
FROM prepared       -- to view the final prepared result set
WHERE rownum > 0    -- before actually running the update
ORDER BY client_id, rownum
*/
/* performing the update */
UPDATE t
SET amount = t.amount + u.amount_update
FROM @IDAT_AR_BALANCES t INNER JOIN prepared u ON t.cvtGuid = u.cvtGuid
OPTION (MAXRECURSION 0);

/* checking the contents after UPDATE */
SELECT * FROM @IDAT_AR_BALANCES;

/* deleting the eliminated amounts */
DELETE FROM @IDAT_AR_BALANCES WHERE amount = 0;

/* checking the contents after DELETE */
SELECT * FROM @IDAT_AR_BALANCES;
Run Code Online (Sandbox Code Playgroud)

更新

amount正如 Lieven 所正确建议的那样(再次感谢您!),您可以先删除帐户中加起来为 0 的所有行,然后更新其他行。这将提高整体性能,因为正如您所说,大多数数据的数量加起来为 0。

以下是 Lieven 删除“零帐户”解决方案的变体:

DELETE FROM @IDAT_AR_BALANCES
WHERE client_id IN (
  SELECT client_id
  FROM @IDAT_AR_BALANCES
  GROUP BY client_id
  HAVING SUM(amount) = 0
)
Run Code Online (Sandbox Code Playgroud)

但请记住,DELETE更新后仍然需要,因为更新可能会将某些值重置amount为 0。如果我是你,我可能会考虑创建一个 FOR UPDATE 触发器,它会自动删除其中的行amount = 0。这样的解决方案并不总是可以接受,但有时是可以的。这取决于您还可以用您的数据做什么。它还可能取决于它是否只是您的项目或还有其他维护者(他们不喜欢行“神奇地”并意外消失)。