use*_*064 4 audit slowly-changing-dimension dimension sql-server-2016
我正在尝试从审计日志中构建一个历史表(最终构建一个类型 2 的维度表)。不幸的是,审计日志只记录正在更改的特定字段。这是我正在谈论的一个粗略的例子;
CREATE TABLE Staff(
[ID] int,
[Surname] varchar(5),
[FirstName] varchar(4),
[Office] varchar(9),
[Date] varchar(10)
);
INSERT INTO Staff ([ID], [Surname], [FirstName], [Office], [Date])
VALUES
(001, 'Smith', 'Bill', 'Melbourne', '2015-01-01'),
(001, NULL, NULL, 'Sydney', '2015-03-01'),
(002, 'Brown', 'Mary', 'Melbourne', '2014-04-01'),
(002, 'Jones', NULL, 'Adelaide', '2014-05-01'),
(002, NULL, NULL, 'Sydney', '2015-01-01'),
(002, NULL, NULL, 'Perth', '2015-03-01');
Run Code Online (Sandbox Code Playgroud)
特定工作人员的第一个条目是创建他们的记录的时间,每个后续记录都是更新...但仅显示对已更新字段的更新*。我想用当前员工记录的其余部分“填写”更新行。即,这样的结果;
001, Smith, Bill, Melbourne, 2015-01-01
001, Smith, Bill, Sydney, 2015-03-01
002, Brown, Mary, Melbourne, 2014-04-01
002, Jones, Mary, Adelaide, 2014-05-01
002, Jones, Mary, Sydney, 2015-01-01
002, Jones, Mary, Perth, 2015-03-01
Run Code Online (Sandbox Code Playgroud)
我知道我可以使用while
循环或 a来做到这一点,cursor
但我怀疑可能有更高性能的选择。
* NULL 总是意味着“值没有改变”而不是“值改变为 NULL”。
Date
列类型为 的示例数据date
:
CREATE TABLE dbo.Staff
(
[ID] integer NOT NULL,
[Surname] varchar(5) NULL,
[FirstName] varchar(4) NULL,
[Office] varchar(9) NULL,
[Date] date NOT NULL,
PRIMARY KEY (ID, [Date])
);
INSERT INTO Staff ([ID], [Surname], [FirstName], [Office], [Date])
VALUES
(001, 'Smith', 'Bill', 'Melbourne', '2015-01-01'),
(001, NULL, NULL, 'Sydney', '2015-03-01'),
(002, 'Brown', 'Mary', 'Melbourne', '2014-04-01'),
(002, 'Jones', NULL, 'Adelaide', '2014-05-01'),
(002, NULL, NULL, 'Sydney', '2015-01-01'),
(002, NULL, NULL, 'Perth', '2015-03-01');
Run Code Online (Sandbox Code Playgroud)
以下解决方案的想法是从当前行后退与前面的空值一样多的行:
SELECT
G.ID,
Surname = LAG(G.Surname, G.SurnameLag) OVER (
PARTITION BY G.ID
ORDER BY G.[Date]),
FirstName = LAG(G.FirstName, G.FirstNameLag) OVER (
PARTITION BY G.ID
ORDER BY G.[Date]),
Office = LAG(G.Office, G.OfficeLag) OVER (
PARTITION BY G.ID
ORDER BY G.[Date]),
G.[Date]
FROM
(
-- Find the LAG offset per column
SELECT
S.ID,
S.Surname,
SurnameLag = SUM(IIF(S.Surname IS NULL, 1, 0)) OVER (
PARTITION BY S.ID
ORDER BY S.[Date]
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW),
S.FirstName,
FirstNameLag = SUM(IIF(S.FirstName IS NULL, 1, 0)) OVER (
PARTITION BY S.ID
ORDER BY S.[Date]
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW),
S.Office,
OfficeLag = SUM(IIF(S.Office IS NULL, 1, 0)) OVER (
PARTITION BY S.ID
ORDER BY S.[Date]
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW),
S.[Date]
FROM dbo.Staff AS S
) AS G
ORDER BY
G.ID, G.[Date];
Run Code Online (Sandbox Code Playgroud)
对于 2012 之前的 SQL Server 版本,IIF
表达式可以写成CASE WHEN <column> IS NULL THEN 1 ELSE 0 END
.
输出:
当 MicrosoftLAG
使用该IGNORE NULLS
选项实施时,这一切都会变得容易得多。
有关更多选项,请参阅Itzik Ben-Gan 的The Last non NULL Puzzle。
我可以使用递归 CTE 来完成此操作,因此它与游标没有什么不同。而且,这些技术往往无法在大容量时很好地扩展。查看代码并看看您的想法。
;WITH cte AS (
SELECT 0 x, Change, ID, Surname, FirstName, Office, [Date]
FROM dbo.Staff
WHERE Change = 0
UNION ALL
SELECT x + 1, s.Change, c.ID, ISNULL( s.Surname, c.Surname ) , ISNULL( s.FirstName, c.FirstName ), ISNULL( s.Office, c.Office ), s.[Date]
FROM cte c
INNER JOIN dbo.Staff s ON c.ID = s.ID
WHERE s.Change = c.x
)
SELECT Change, ID, Surname, FirstName, Office, [Date]
FROM cte
WHERE x > 0
ORDER BY ID, x
Run Code Online (Sandbox Code Playgroud)
归档时间: |
|
查看次数: |
762 次 |
最近记录: |