Avn*_*rSo 8 performance sql-server query scalability
我有一个包含事务的基表,我需要创建一个带有运行总计的表。我需要它们是每个帐户,并且每个帐户都有一些运行总计(取决于交易类型),在其中,每个子帐户都有一些运行总计。
我的基表有这些字段(或多或少):
AccountID | SubAccountID | TransactionType | TransactionAmount
Run Code Online (Sandbox Code Playgroud)
考虑到我每个 Account/TransactionType 有大约 4 种类型的运行总计,每个 Account/SubAccount/TransactionType 有 2 个以上的运行总计,我有大约 200 万个账户,每个账户有大约 10 个子账户,我收到了大约 1 万笔交易每分钟(在最大负载下),你会怎么做?
这也是必须通过 SQL 作业异步运行,创建聚合而不是事务本身的一部分。
我在这里使用光标很困难 - 这需要太长时间。我真的很感激任何或多或少相同的建议/文章。
异步意味着运行总计不需要始终完全准确,或者您的数据更改模式使得一次性运行总计构建在下一次加载之前将有效且准确。不管怎样,我相信你已经考虑过那部分了,所以我不会在这一点上工作。
高性能、受支持的方法的主要选项是 SQLCLR 函数/过程,或UPDATE基于 Hugo Kornelis 的基于集合的迭代方法。可以在此处找到 SQLCLR 方法(在过程中实现,但相当容易翻译)。
我没能在网上找到 Hugo 的方法,但在优秀的 MVP Deep Dives (Volume 1) 中有详细介绍。说明 Hugo 方法的示例代码(从我在另一个网站上的一篇帖子中复制,您可能没有登录)如下所示:
-- A work table to hold the reformatted data, and
-- ultimately, the results
CREATE TABLE #Work
(
Acct_No VARCHAR(20) NOT NULL,
MonthDate DATETIME NOT NULL,
MonthRate DECIMAL(19,12) NOT NULL,
Amount DECIMAL(19,12) NOT NULL,
InterestAmount DECIMAL(19,12) NOT NULL,
RunningTotal DECIMAL(19,12) NOT NULL,
RowRank BIGINT NOT NULL
);
-- Prepare the set-based iteration method
WITH Accounts
AS (
-- Get a list of the account numbers
SELECT DISTINCT Acct_No
FROM #Refunds
),
Rates
AS (
-- Apply all the accounts to all the rates
SELECT A.Acct_No,
R.[Year],
R.[Month],
MonthRate = R.InterestRate / 12
FROM #InterestRates R
CROSS
JOIN Accounts A
),
BaseData
AS (
-- The basic data we need to work with
SELECT Acct_No = ISNULL(R.Acct_No,''),
MonthDate = ISNULL(DATEADD(MONTH, R.[Month], DATEADD(YEAR, R.[year] - 1900, 0)), 0),
R.MonthRate,
Amount = ISNULL(RF.Amount,0),
InterestAmount = ISNULL(RF.Amount,0) * R.MonthRate,
RunningTotal = ISNULL(RF.Amount,0)
FROM Rates R
LEFT
JOIN #Refunds RF
ON RF.Acct_No = R.Acct_No
AND RF.[Year] = R.[Year]
AND RF.[Month] = R.[Month]
)
-- Basic data plus a rank id, numbering the rows by MonthDate, and resetting to 1 for each new Account
INSERT #Work
(Acct_No, MonthDate, MonthRate, Amount, InterestAmount, RunningTotal, RowRank)
SELECT BD.Acct_No, BD.MonthDate, BD.MonthRate, BD.Amount, BD.InterestAmount, BD.RunningTotal,
RowRank = RANK() OVER (PARTITION BY BD.Acct_No ORDER BY MonthDate)
FROM BaseData BD;
-- An index to speed the next stage (different from that used with the Quirky Update method)
CREATE UNIQUE CLUSTERED INDEX nc1 ON #Work (RowRank, Acct_No);
-- Iteration variables
DECLARE @Rank BIGINT,
@RowCount INTEGER;
-- Initialize
SELECT @Rank = 1,
@RowCount = 1;
-- This is the iteration bit, processes a rank id per iteration
-- The number of rows processed with each iteration is equal to the number of groups in the data
-- More groups --> greater efficiency
WHILE (1 = 1)
BEGIN
SET @Rank = @Rank + 1;
-- Set-based update with running totals for the current rank id
UPDATE This
SET InterestAmount = (Previous.RunningTotal + This.Amount) * This.MonthRate,
RunningTotal = Previous.RunningTotal + This.Amount + (Previous.RunningTotal + This.Amount) * This.MonthRate
FROM #Work This
JOIN #Work Previous
ON Previous.Acct_No = This.Acct_No
AND Previous.RowRank = @Rank - 1
WHERE This.RowRank = @Rank;
IF (@@ROWCOUNT = 0) BREAK;
END;
-- Show the results in natural order
SELECT *
FROM #Work
ORDER BY
Acct_No, RowRank;
Run Code Online (Sandbox Code Playgroud)
在 SQL Server 2012 中,您可以使用窗口函数扩展,例如SUM OVER (ORDER BY).
我不确定你为什么想要异步,但几个索引视图听起来就像这里的票。如果您想要每个组的简单 SUM,即:定义运行总计。
如果你真的想要异步,每秒 160 个新行你的运行总数将永远是过时的。异步意味着没有触发器或索引视图
无论您是使用游标还是三角形连接,计算运行总计都非常缓慢。非规范化非常诱人,将运行总计存储在列中,特别是如果您经常选择它。但是,像往常一样,在进行非规范化时,您需要保证非规范化数据的完整性。幸运的是,您可以通过约束保证运行总计的完整性——只要您的所有约束都是可信的,您的所有运行总计都是正确的。
此外,通过这种方式,您可以轻松确保当前余额(运行总计)永远不会为负 - 通过其他方法执行也可能非常缓慢。以下脚本演示了该技术。
CREATE TABLE Data.Inventory(InventoryID INT NOT NULL IDENTITY,
ItemID INT NOT NULL,
ChangeDate DATETIME NOT NULL,
ChangeQty INT NOT NULL,
TotalQty INT NOT NULL,
PreviousChangeDate DATETIME NULL,
PreviousTotalQty INT NULL,
CONSTRAINT PK_Inventory PRIMARY KEY(ItemID, ChangeDate),
CONSTRAINT UNQ_Inventory UNIQUE(ItemID, ChangeDate, TotalQty),
CONSTRAINT UNQ_Inventory_Previous_Columns UNIQUE(ItemID, PreviousChangeDate, PreviousTotalQty),
CONSTRAINT FK_Inventory_Self FOREIGN KEY(ItemID, PreviousChangeDate, PreviousTotalQty)
REFERENCES Data.Inventory(ItemID, ChangeDate, TotalQty),
CONSTRAINT CHK_Inventory_Valid_TotalQty CHECK(TotalQty >= 0 AND (TotalQty = COALESCE(PreviousTotalQty, 0) + ChangeQty)),
CONSTRAINT CHK_Inventory_Valid_Dates_Sequence CHECK(PreviousChangeDate < ChangeDate),
CONSTRAINT CHK_Inventory_Valid_Previous_Columns CHECK((PreviousChangeDate IS NULL AND PreviousTotalQty IS NULL)
OR (PreviousChangeDate IS NOT NULL AND PreviousTotalQty IS NOT NULL))
);
GO
-- beginning of inventory for item 1
INSERT INTO Data.Inventory(ItemID,
ChangeDate,
ChangeQty,
TotalQty,
PreviousChangeDate,
PreviousTotalQty)
VALUES(1, '20090101', 10, 10, NULL, NULL);
-- cannot begin the inventory for the second time for the same item 1
INSERT INTO Data.Inventory(ItemID,
ChangeDate,
ChangeQty,
TotalQty,
PreviousChangeDate,
PreviousTotalQty)
VALUES(1, '20090102', 10, 10, NULL, NULL);
Msg 2627, Level 14, State 1, Line 10
Violation of UNIQUE KEY constraint 'UNQ_Inventory_Previous_Columns'. Cannot insert duplicate key in object 'Data.Inventory'.
The statement has been terminated.
-- add more
DECLARE @ChangeQty INT;
SET @ChangeQty = 5;
INSERT INTO Data.Inventory(ItemID,
ChangeDate,
ChangeQty,
TotalQty,
PreviousChangeDate,
PreviousTotalQty)
SELECT TOP 1 ItemID, '20090103', @ChangeQty, TotalQty + @ChangeQty, ChangeDate, TotalQty
FROM Data.Inventory
WHERE ItemID = 1
ORDER BY ChangeDate DESC;
SET @ChangeQty = 3;
INSERT INTO Data.Inventory(ItemID,
ChangeDate,
ChangeQty,
TotalQty,
PreviousChangeDate,
PreviousTotalQty)
SELECT TOP 1 ItemID, '20090104', @ChangeQty, TotalQty + @ChangeQty, ChangeDate, TotalQty
FROM Data.Inventory
WHERE ItemID = 1
ORDER BY ChangeDate DESC;
SET @ChangeQty = -4;
INSERT INTO Data.Inventory(ItemID,
ChangeDate,
ChangeQty,
TotalQty,
PreviousChangeDate,
PreviousTotalQty)
SELECT TOP 1 ItemID, '20090105', @ChangeQty, TotalQty + @ChangeQty, ChangeDate, TotalQty
FROM Data.Inventory
WHERE ItemID = 1
ORDER BY ChangeDate DESC;
-- try to violate chronological order
SET @ChangeQty = 5;
INSERT INTO Data.Inventory(ItemID,
ChangeDate,
ChangeQty,
TotalQty,
PreviousChangeDate,
PreviousTotalQty)
SELECT TOP 1 ItemID, '20081231', @ChangeQty, TotalQty + @ChangeQty, ChangeDate, TotalQty
FROM Data.Inventory
WHERE ItemID = 1
ORDER BY ChangeDate DESC;
Msg 547, Level 16, State 0, Line 4
The INSERT statement conflicted with the CHECK constraint "CHK_Inventory_Valid_Dates_Sequence". The conflict occurred in database "Test", table "Data.Inventory".
The statement has been terminated.
SELECT ChangeDate,
ChangeQty,
TotalQty,
PreviousChangeDate,
PreviousTotalQty
FROM Data.Inventory ORDER BY ChangeDate;
ChangeDate ChangeQty TotalQty PreviousChangeDate PreviousTotalQty
----------------------- ----------- ----------- ----------------------- -----
2009-01-01 00:00:00.000 10 10 NULL NULL
2009-01-03 00:00:00.000 5 15 2009-01-01 00:00:00.000 10
2009-01-04 00:00:00.000 3 18 2009-01-03 00:00:00.000 15
2009-01-05 00:00:00.000 -4 14 2009-01-04 00:00:00.000 18
-- try to change a single row, all updates must fail
UPDATE Data.Inventory SET ChangeQty = ChangeQty + 2 WHERE InventoryID = 3;
UPDATE Data.Inventory SET TotalQty = TotalQty + 2 WHERE InventoryID = 3;
-- try to delete not the last row, all deletes must fail
DELETE FROM Data.Inventory WHERE InventoryID = 1;
DELETE FROM Data.Inventory WHERE InventoryID = 3;
-- the right way to update
DECLARE @IncreaseQty INT;
SET @IncreaseQty = 2;
UPDATE Data.Inventory SET ChangeQty = ChangeQty + CASE WHEN ItemID = 1 AND ChangeDate = '20090103' THEN @IncreaseQty ELSE 0 END,
TotalQty = TotalQty + @IncreaseQty,
PreviousTotalQty = PreviousTotalQty + CASE WHEN ItemID = 1 AND ChangeDate = '20090103' THEN 0 ELSE @IncreaseQty END
WHERE ItemID = 1 AND ChangeDate >= '20090103';
SELECT ChangeDate,
ChangeQty,
TotalQty,
PreviousChangeDate,
PreviousTotalQty
FROM Data.Inventory ORDER BY ChangeDate;
ChangeDate ChangeQty TotalQty PreviousChangeDate PreviousTotalQty
----------------------- ----------- ----------- ----------------------- ----------------
2009-01-01 00:00:00.000 10 10 NULL NULL
2009-01-03 00:00:00.000 7 17 2009-01-01 00:00:00.000 10
2009-01-04 00:00:00.000 3 20 2009-01-03 00:00:00.000 17
2009-01-05 00:00:00.000 -4 16 2009-01-04 00:00:00.000 20
Run Code Online (Sandbox Code Playgroud)
复制自我的博客