SQL Server中最常用的方法是将多个数据更改压缩为值之前和之后

Dra*_*mmy 7 sql sql-server

我有一个SQL Server数据库,其中包含一些显示第三方数据库(OpenEdge)更改的审计记录.我无法控制审计数据的结构,也无法控制第三方数据库审计数据更改的方式.所以我留下了,例如,以下数据......

源数据集

如果您按照前五行进行操作,则可以看到它们都属于TransId 1532102(表示数据库事务),其中TransSeq表示单个事务中的数据库操作.

在列前缀中New,审核更改是可见的.如果值为NULL,则不会对该字段进行任何更改.

查看数据,您可以看到TransId = 1532102,其中PrimaryIdentifier从2更改为-2(第1行),然后从-2更改为3(第3行),然后从3更改为4(第4行),最后从4更改为4到5(第5行).您可能还注意到,当PrimaryIdentifier从3更改为4时,SecondaryIdentifier从'abcd'更改为'efgh'(第4行).因此,这些多个更改实际上只发生在单个源记录上.因此,考虑到这一点,第1,3,4和5行都可以压缩成一行(见下文)

在此输入图像描述

最终TransId 1532102只有两项记录变化.

在此输入图像描述

我需要将这些更改转换为目标数据库上的单个UPDATE语句.为了做到这一点,我需要确保我有一条记录显示前后值.

因此,鉴于此处提供的源数据,我需要生成以下数据集.

必需的数据集

我可以使用哪些查询结构来实现此目的?我在考虑递归CTE或者使用Hierarchical结构?最终我需要这个以尽可能好的表现,所以我想在这里提出问题,以防我没有考虑所有可能的方法.

欢迎思考,这是一个示例数据的脚本

DECLARE @TestTable TABLE (SyncId INT, TransId INT, TransSeq INT, PrimaryIdentifier INT, SecondaryIdentifier NCHAR(4), NewPrimaryIdentifier INT, NewSecondaryIdentifier NCHAR(4), NewLevel INT, NewValue NVARCHAR(20))
INSERT  @TestTable
        SELECT 128, 1532102, 0,  2, 'abcd',   -2,   NULL, NULL, 'test data'
UNION   SELECT 128, 1532102, 1,  3, 'abcd',    2,   NULL, NULL, NULL
UNION   SELECT 128, 1532102, 2, -2, 'abcd',    3,   NULL, NULL, NULL
UNION   SELECT 128, 1532102, 3,  3, 'abcd',    4, 'efgh', NULL, NULL
UNION   SELECT 128, 1532102, 4,  4, 'efgh',    5,   NULL,    2, NULL
UNION   SELECT 128, 1532102, 5,  5, 'efgh', NULL, 'ghfi', NULL, NULL
UNION   SELECT 128, 1532106, 0,  3, 'abcd',   -3,   NULL, NULL, NULL
UNION   SELECT 128, 1532106, 1,  4, 'abcd',    3,   NULL, NULL, NULL
UNION   SELECT 128, 1532106, 2, -3, 'abcd',    4,   NULL, NULL, NULL
UNION   SELECT 128, 1532110, 0,  4, 'abcd',   -4,   NULL, NULL, NULL
UNION   SELECT 128, 1532110, 1,  5, 'abcd',    4,   NULL, NULL, NULL
UNION   SELECT 128, 1532110, 2, -4, 'abcd',    5,   NULL, NULL, NULL
UNION   SELECT 128, 1532114, 0,  5, 'abcd',   -5,   NULL, NULL, NULL
UNION   SELECT 128, 1532114, 1,  4, 'abcd',    5,   NULL,    1, NULL
UNION   SELECT 128, 1532114, 2, -5, 'abcd',    4,   NULL, NULL, 'some more test data'

SELECT  *
FROM    @TestTable
Run Code Online (Sandbox Code Playgroud)

编辑: 我实际上无法编写任何成功跟踪标识符更改的查询.任何人都可以提供帮助 - 我需要一个跟踪PrimaryIdentifier值变化的查询,并最终为每个跟踪提供单个记录,包括起始值和结束值.

编辑2: 有一个删除的答案表明,在压缩时无法更新密钥标识符,而是我应该逐步完成更改.我认为将我的评论添加到问题的进一步信息是有价值的..

由于生成审计记录的数量,我需要压缩数据集; 由于源DBMS进行更改的方式,其中大多数是不必要的.我需要减少数据集,我需要跟踪关键标识符更改.在更新语句期间,可以在不更改ID更改的情况下进行更新 - 请参阅此示例.

Ser*_*erg 2

I assume that
1) (PrimaryIdentifier, SecondaryIdentifier) is a PK of the target table,
2) Every transacton in the audit table leaves target table in a consistent state. So the update of the PK in a single statement for every transaction using case will run OK:

declare @t table (id int primary key, old int);
insert @t(id, old) values (4,4),(5,5);
update @t set id = case id 
     when 4 then 5 
     when 5 then 4 end;
select * from @t;
Run Code Online (Sandbox Code Playgroud)

The plan is 1. Condense transactions 2. Generate update sql into temp table. Then you can run all or selected items from the temp table. Every item is of the form

UPDATE myTable SET 
         PrimaryIdentifier = CASE WHEN PrimaryIdentifier=2 AND SecondaryIdentifier='abcd' THEN 5 
                                  WHEN PrimaryIdentifier=3 AND SecondaryIdentifier='abcd' THEN 2 END,  
        SecondaryIdentifier = CASE WHEN PrimaryIdentifier=2 AND SecondaryIdentifier='abcd' THEN 'efgh' 
                                   WHEN PrimaryIdentifier=3 AND SecondaryIdentifier='abcd' THEN 'abcd' END , 
        Level= CASE WHEN PrimaryIdentifier=2 AND SecondaryIdentifier='abcd' THEN 2 
                    WHEN PrimaryIdentifier=3 AND SecondaryIdentifier='abcd' THEN  Level  END , 
        Value= CASE WHEN PrimaryIdentifier=2 AND SecondaryIdentifier='abcd' THEN 'test data' 
                    WHEN PrimaryIdentifier=3 AND SecondaryIdentifier='abcd' THEN  Value  END
WHERE 1=2 OR (PrimaryIdentifier=2 AND SecondaryIdentifier='abcd') 
          OR (PrimaryIdentifier=3 AND SecondaryIdentifier='abcd')
Run Code Online (Sandbox Code Playgroud)

The query

DECLARE @TestTable TABLE (SyncId INT, TransId INT, TransSeq INT, PrimaryIdentifier INT, SecondaryIdentifier NCHAR(4), NewPrimaryIdentifier INT, NewSecondaryIdentifier NCHAR(4), NewLevel INT, NewValue NVARCHAR(20))
INSERT  @TestTable
        SELECT 128, 1532102, 0,  2, 'abcd', -2, NULL,   NULL,   'test data'
UNION   SELECT 128, 1532102, 1,  3, 'abcd',  2, NULL,   NULL,   NULL
UNION   SELECT 128, 1532102, 2, -2, 'abcd',  3, NULL,   NULL,   NULL
UNION   SELECT 128, 1532102, 3,  3, 'abcd',  4, 'efgh', NULL,   NULL
UNION   SELECT 128, 1532102, 4,  4, 'efgh',  5, NULL,   2,      NULL
UNION   SELECT 128, 1532106, 0,  3, 'abcd', -3, NULL,   NULL,   NULL
UNION   SELECT 128, 1532106, 1,  4, 'abcd',  3, NULL,   NULL,   NULL
UNION   SELECT 128, 1532106, 2, -3, 'abcd',  4, NULL,   NULL,   NULL
UNION   SELECT 128, 1532110, 0,  4, 'abcd', -4, NULL,   NULL,   NULL
UNION   SELECT 128, 1532110, 1,  5, 'abcd',  4, NULL,   NULL,   NULL
UNION   SELECT 128, 1532110, 2, -4, 'abcd',  5, NULL,   NULL,   NULL
UNION   SELECT 128, 1532110, 3,  5, 'abcd',  6, NULL,   NULL,   NULL
UNION   SELECT 128, 1532110, 4,  6, 'abcd',  5, NULL,   NULL,   NULL
UNION   SELECT 128, 1532114, 0,  5, 'abcd', -5, NULL,   NULL,   NULL
UNION   SELECT 128, 1532114, 1,  4, 'abcd',  5, NULL,   1,      NULL
UNION   SELECT 128, 1532114, 2, -5, 'abcd',  4, NULL,   NULL,   'some more test data'
;
WITH root AS (
    -- Top parent updates within transactions
    SELECT SyncId, TransId, TransSeq, PrimaryIdentifier AS rPrimaryIdentifier, SecondaryIdentifier AS rSecondaryIdentifier, 
    NewPrimaryIdentifier, 
    coalesce(NewSecondaryIdentifier, SecondaryIdentifier) AS NewSecondaryIdentifier,
    newLevel, NewValue
    FROM  @TestTable t
    WHERE NOT EXISTS (SELECT 1 
                   FROM  @TestTable t2 
                   WHERE t2.SyncId=t.SyncId AND t2.TransId = t.TransId
                       AND t2.TransSeq < t.TransSeq 
                       AND t.PrimaryIdentifier = t2.NewPrimaryIdentifier
                       AND t.SecondaryIdentifier = coalesce(t2.NewSecondaryIdentifier, t2.SecondaryIdentifier) 
                   )
    -- recursion to track the chain of updates
    UNION ALL
    SELECT root.SyncId, root.TransId, t.TransSeq, rPrimaryIdentifier, rSecondaryIdentifier,
         t.NewPrimaryIdentifier,
         coalesce(t.NewSecondaryIdentifier, root.NewSecondaryIdentifier),
         coalesce(root.NewLevel, t.NewLevel), coalesce(root.NewValue, t.NewValue)
    FROM root 
    JOIN @TestTable t ON root.SyncId=t.SyncId AND root.TransId = t.TransId
                       AND root.TransSeq < t.TransSeq 
                       AND t.PrimaryIdentifier = root.NewPrimaryIdentifier
                       AND t.SecondaryIdentifier = root.NewSecondaryIdentifier

)
,condensed as (
    -- last update in the chain
    SELECT TOP(1) WITH TIES *  
    FROM root
    ORDER BY row_number() over (partition by SyncId, TransId, rPrimaryIdentifier, rSecondaryIdentifier 
                                order by TransSeq desc)
)
-- generate sql
SELECT SyncId, TransId, sql = 'UPDATE myTable SET PrimaryIdentifier = CASE'

    + (SELECT ' WHEN PrimaryIdentifier='+ CAST(rPrimaryIdentifier as varchar(20)) 
             +' AND SecondaryIdentifier=''' + rSecondaryIdentifier 
             +''' THEN ' + CAST(NewPrimaryIdentifier as varchar(20))             
        FROM condensed c2 
        WHERE c1.SyncId = c2.SyncId AND  c1.TransId= c2.TransId
        FOR XML PATH('') ) 
    + ' END,  SecondaryIdentifier = CASE'
    + (SELECT ' WHEN PrimaryIdentifier='+ CAST(rPrimaryIdentifier as varchar(20)) 
             +' AND SecondaryIdentifier=''' + rSecondaryIdentifier
             +''' THEN ''' + NewSecondaryIdentifier + ''''
        FROM condensed c2 
        WHERE c1.SyncId = c2.SyncId AND  c1.TransId= c2.TransId
        FOR XML PATH('') )
    + ' END , Level= CASE'
    + (SELECT ' WHEN PrimaryIdentifier='+ CAST(rPrimaryIdentifier as varchar(20)) 
             +' AND SecondaryIdentifier=''' + rSecondaryIdentifier
             +''' THEN ' 
             + CASE WHEN NewLevel IS NULL THEN ' Level ' ELSE CAST(NewLevel  as varchar(20)) END 
        FROM condensed c2 
        WHERE c1.SyncId = c2.SyncId AND  c1.TransId= c2.TransId
        FOR XML PATH('') )
    + ' END , Value= CASE'
    + (SELECT ' WHEN PrimaryIdentifier='+ CAST(rPrimaryIdentifier as varchar(20)) 
             +' AND SecondaryIdentifier=''' + rSecondaryIdentifier
             +''' THEN ' 
             + CASE WHEN NewValue IS NULL THEN ' Value ' ELSE '''' + NewValue + '''' END 
        FROM condensed c2 
        WHERE c1.SyncId = c2.SyncId AND  c1.TransId= c2.TransId
        FOR XML PATH('') )
     + ' END'
     + ' WHERE 1=2'
     + (SELECT ' OR (PrimaryIdentifier='+ CAST(rPrimaryIdentifier as varchar(20)) 
         +' AND SecondaryIdentifier=''' + rSecondaryIdentifier +''')'
    FROM condensed c2 
    WHERE c1.SyncId = c2.SyncId AND  c1.TransId= c2.TransId
    FOR XML PATH('') )
INTO #UpdSql    
FROM condensed c1 
GROUP BY SyncId, TransId


SELECT * 
FROM #UpdSql
ORDER BY SyncId, TransId
Run Code Online (Sandbox Code Playgroud)

EDIT

Taking into account NewPrimaryIdentifier can be NULL too. See added row at @TestTable. Sql generation skipped.

DECLARE @TestTable TABLE (SyncId INT, TransId INT, TransSeq INT, PrimaryIdentifier INT, SecondaryIdentifier NCHAR(4), NewPrimaryIdentifier INT, NewSecondaryIdentifier NCHAR(4), NewLevel INT, NewValue NVARCHAR(20))
INSERT  @TestTable
        SELECT 128, 1532102, 0,  2, 'abcd', -2, NULL,   NULL,   'test data'
UNION   SELECT 128, 1532102, 1,  3, 'abcd',  2, NULL,   NULL,   NULL
UNION   SELECT 128, 1532102, 2, -2, 'abcd',  3, NULL,   NULL,   NULL
UNION   SELECT 128, 1532102, 3,  3, 'abcd',  4, 'efgh', NULL,   NULL
UNION   SELECT 128, 1532102, 4,  4, 'efgh',  5, NULL,   2,      NULL
UNION   SELECT 128, 1532102, 5,  5, 'efgh', null, 'ghfi', null, NULL -- added
UNION   SELECT 128, 1532106, 0,  3, 'abcd', -3, NULL,   NULL,   NULL
UNION   SELECT 128, 1532106, 1,  4, 'abcd',  3, NULL,   NULL,   NULL
UNION   SELECT 128, 1532106, 2, -3, 'abcd',  4, NULL,   NULL,   NULL
UNION   SELECT 128, 1532110, 0,  4, 'abcd', -4, NULL,   NULL,   NULL
UNION   SELECT 128, 1532110, 1,  5, 'abcd',  4, NULL,   NULL,   NULL
UNION   SELECT 128, 1532110, 2, -4, 'abcd',  5, NULL,   NULL,   NULL
UNION   SELECT 128, 1532110, 3,  5, 'abcd',  6, NULL,   NULL,   NULL
UNION   SELECT 128, 1532110, 4,  6, 'abcd',  5, NULL,   NULL,   NULL
UNION   SELECT 128, 1532114, 0,  5, 'abcd', -5, NULL,   NULL,   NULL
UNION   SELECT 128, 1532114, 1,  4, 'abcd',  5, NULL,   1,      NULL
UNION   SELECT 128, 1532114, 2, -5, 'abcd',  4, NULL,   NULL,   'some more test data'
;
WITH root AS (
    -- Top parent updates within transactions
    SELECT SyncId, TransId, TransSeq, PrimaryIdentifier AS rPrimaryIdentifier, SecondaryIdentifier AS rSecondaryIdentifier, 
    coalesce(NewPrimaryIdentifier, PrimaryIdentifier) AS NewPrimaryIdentifier,
    coalesce(NewSecondaryIdentifier, SecondaryIdentifier) AS NewSecondaryIdentifier,
    newLevel, NewValue
    FROM  @TestTable t
    WHERE NOT EXISTS (SELECT 1 
                   FROM  @TestTable t2 
                   WHERE t2.SyncId=t.SyncId AND t2.TransId = t.TransId
                       AND t2.TransSeq < t.TransSeq 
                       AND t.PrimaryIdentifier = coalesce(t2.NewPrimaryIdentifier, t2.PrimaryIdentifier)
                       AND t.SecondaryIdentifier = coalesce(t2.NewSecondaryIdentifier, t2.SecondaryIdentifier) 
                   )
    -- recursion to track the chain of updates
    UNION ALL
    SELECT root.SyncId, root.TransId, t.TransSeq, rPrimaryIdentifier, rSecondaryIdentifier,
         coalesce(t.NewPrimaryIdentifier, root.NewPrimaryIdentifier),
         coalesce(t.NewSecondaryIdentifier, root.NewSecondaryIdentifier),
         coalesce(t.NewLevel, root.NewLevel), coalesce(t.NewValue, root.NewValue)
    FROM root 
    JOIN @TestTable t ON root.SyncId=t.SyncId AND root.TransId = t.TransId
                       AND root.TransSeq < t.TransSeq 
                       AND t.PrimaryIdentifier = root.NewPrimaryIdentifier
                       AND t.SecondaryIdentifier = root.NewSecondaryIdentifier

)
,condensed as (
    -- last update in the chain
    SELECT TOP(1) WITH TIES *  
    FROM root
    ORDER BY row_number() over (partition by SyncId, TransId, rPrimaryIdentifier, rSecondaryIdentifier 
                                order by TransSeq desc)
)
SELECT * 
FROM condensed 
ORDER BY SyncId, TransId, rPrimaryIdentifier, rSecondaryIdentifier
Run Code Online (Sandbox Code Playgroud)