基于前一行合并一行数据

use*_*064 4 audit slowly-changing-dimension dimension sql-server-2016

我正在尝试从审计日志中构建一个历史表(最终构建一个类型 2 的维度表)。不幸的是,审计日志只记录正在更改的特定字段。这是我正在谈论的一个粗略的例子;

CREATE TABLE Staff(
  [ID] int, 
  [Surname] varchar(5), 
  [FirstName] varchar(4), 
  [Office] varchar(9), 
  [Date] varchar(10)
);

INSERT INTO Staff ([ID], [Surname], [FirstName], [Office], [Date])
VALUES
  (001, 'Smith', 'Bill', 'Melbourne', '2015-01-01'),
  (001, NULL, NULL, 'Sydney', '2015-03-01'),
  (002, 'Brown', 'Mary', 'Melbourne', '2014-04-01'),
  (002, 'Jones', NULL, 'Adelaide', '2014-05-01'),
  (002, NULL, NULL, 'Sydney', '2015-01-01'),
  (002, NULL, NULL, 'Perth', '2015-03-01');
Run Code Online (Sandbox Code Playgroud)

特定工作人员的第一个条目是创建他们的记录的时间,每个后续记录都是更新...但仅显示对已更新字段的更新*。我想用当前员工记录的其余部分“填写”更新行。即,这样的结果;

001, Smith, Bill, Melbourne, 2015-01-01
001, Smith, Bill, Sydney, 2015-03-01
002, Brown, Mary, Melbourne, 2014-04-01
002, Jones, Mary, Adelaide, 2014-05-01
002, Jones, Mary, Sydney, 2015-01-01
002, Jones, Mary, Perth, 2015-03-01
Run Code Online (Sandbox Code Playgroud)

我知道我可以使用while循环或 a来做到这一点,cursor但我怀疑可能有更高性能的选择。


* NULL 总是意味着“值没有改变”而不是“值改变为 NULL”。

Pau*_*ite 6

Date列类型为 的示例数据date

CREATE TABLE dbo.Staff
(
  [ID] integer NOT NULL, 
  [Surname] varchar(5) NULL, 
  [FirstName] varchar(4) NULL, 
  [Office] varchar(9) NULL, 
  [Date] date NOT NULL,

  PRIMARY KEY (ID, [Date])
);

INSERT INTO Staff ([ID], [Surname], [FirstName], [Office], [Date])
VALUES
  (001, 'Smith', 'Bill', 'Melbourne', '2015-01-01'),
  (001, NULL, NULL, 'Sydney', '2015-03-01'),
  (002, 'Brown', 'Mary', 'Melbourne', '2014-04-01'),
  (002, 'Jones', NULL, 'Adelaide', '2014-05-01'),
  (002, NULL, NULL, 'Sydney', '2015-01-01'),
  (002, NULL, NULL, 'Perth', '2015-03-01');
Run Code Online (Sandbox Code Playgroud)

以下解决方案的想法是从当前行后退与前面的空值一样多的行:

SELECT
    G.ID,
    Surname = LAG(G.Surname, G.SurnameLag) OVER (
        PARTITION BY G.ID 
        ORDER BY G.[Date]),
    FirstName = LAG(G.FirstName, G.FirstNameLag) OVER (
        PARTITION BY G.ID 
        ORDER BY G.[Date]),
    Office = LAG(G.Office, G.OfficeLag) OVER (
        PARTITION BY G.ID 
        ORDER BY G.[Date]),
    G.[Date]
FROM 
(
    -- Find the LAG offset per column
    SELECT
        S.ID,
        S.Surname,
        SurnameLag = SUM(IIF(S.Surname IS NULL, 1, 0)) OVER (
            PARTITION BY S.ID
            ORDER BY S.[Date]
            ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW),
        S.FirstName,
        FirstNameLag = SUM(IIF(S.FirstName IS NULL, 1, 0)) OVER (
            PARTITION BY S.ID
            ORDER BY S.[Date]
            ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW),
        S.Office,
        OfficeLag = SUM(IIF(S.Office IS NULL, 1, 0)) OVER (
            PARTITION BY S.ID
            ORDER BY S.[Date]
            ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW),
        S.[Date]
    FROM dbo.Staff AS S
) AS G
ORDER BY
    G.ID, G.[Date];
Run Code Online (Sandbox Code Playgroud)

对于 2012 之前的 SQL Server 版本,IIF表达式可以写成CASE WHEN <column> IS NULL THEN 1 ELSE 0 END.

输出:

结果

当 MicrosoftLAG使用该IGNORE NULLS选项实施时,这一切都会变得容易得多。

有关更多选项,请参阅Itzik Ben-Gan 的The Last non NULL Puzzle


wBo*_*Bob 5

我可以使用递归 CTE 来完成此操作,因此它与游标没有什么不同。而且,这些技术往往无法在大容量时很好地扩展。查看代码并看看您的想法。

;WITH cte AS (
SELECT 0 x, Change, ID, Surname, FirstName, Office, [Date]
FROM dbo.Staff
WHERE Change = 0

UNION ALL

SELECT x + 1, s.Change, c.ID, ISNULL( s.Surname, c.Surname ) , ISNULL( s.FirstName, c.FirstName ), ISNULL( s.Office, c.Office ), s.[Date]
FROM cte c
    INNER JOIN dbo.Staff s ON c.ID = s.ID
WHERE s.Change = c.x
)
SELECT Change, ID, Surname, FirstName, Office, [Date]
FROM  cte
WHERE x > 0
ORDER BY ID, x
Run Code Online (Sandbox Code Playgroud)