查询以规范化表/组合行文本

Ada*_*dam 5 performance normalization join sql-server query

我有一个表(称为 oldTable),其中的列如下:

ID (int),Rank (int),TextLineNumber (int),SomeText (varchar)

主键是多部分的:ID+Rank+TextLineNumber。

我正在尝试将它转换/加入另一个表(称为 newTable),其中的列如下:

ID (int)、Rank (int)、CombinedText (varchar)

主键是 ID+Rank。

新表上的 ID 和 Rank 已经填充,但我需要一个查询来更新 newTable 的 CombinedText 列,并考虑以下注意事项:

  1. 新表给定的Rank在旧表上可能不存在,在这种情况下,它需要从旧表中选择不大于新表上的等级的最高可用等级。
  2. CombinedText 列是旧表中“SomeText”列的字符串串联,使用从第一个考虑中找到的 Rank 按“TextLineNumber”的顺序串联。

以下是一些示例数据:

旧 - http://i54.tinypic.com/jq0vmx.png

新 - http://i53.tinypic.com/dhfyn8.png

如果重要的话,我正在使用 MSSql 2005。我目前使用 T-SQL 和 while 循环执行此操作,但它已成为一个严重的性能瓶颈(10000 行大约需要 1 分钟)。

编辑:CSV 中的扩展示例数据:
旧:

ID,Rank,LineNumber,SomeText
1,1,1,the qu  
1,1,2,ick br  
1,1,3,own  
1,2,1,some te  
1,2,2,xt  
1,3,1,sample  
2,7,1,jumped ov  
2,7,2,er the  
2,7,3,lazy  
2,13,1,samp  
2,13,2,le text  
3,1,1,ABC  
3,1,2,DEF  
3,1,3,GHI  
3,1,4,JKL  
3,50,1,XYZ
Run Code Online (Sandbox Code Playgroud)

新的:

ID,Rank,CombinedText
1,2,some text
2,13,sample text
2,14,sample text
3,4,ABCDEFGHIJKL
3,5,ABCDEFGHIJKL
3,50,XYZ
3,55,XYZ
Run Code Online (Sandbox Code Playgroud)

edit2:
这是一个示例查询,我发现它确实有效,但速度不够快(依赖于多个子查询):

update newtable set combinedtext = 
coalesce ((select top 1 sometext from OldTable where OldTable.id=newtable.id and oldtable.rank=(select top 1 rank from oldtable where oldtable.id=newtable.id and oldtable.rank<=newtable.rank order by rank desc) and oldtable.linenumber=1),'') +
coalesce ((select top 1 sometext from OldTable where OldTable.id=newtable.id and oldtable.rank=(select top 1 rank from oldtable where oldtable.id=newtable.id and oldtable.rank<=newtable.rank order by rank desc) and oldtable.linenumber=2),'') +
coalesce ((select top 1 sometext from OldTable where OldTable.id=newtable.id and oldtable.rank=(select top 1 rank from oldtable where oldtable.id=newtable.id and oldtable.rank<=newtable.rank order by rank desc) and oldtable.linenumber=3),'') +
coalesce ((select top 1 sometext from OldTable where OldTable.id=newtable.id and oldtable.rank=(select top 1 rank from oldtable where oldtable.id=newtable.id and oldtable.rank<=newtable.rank order by rank desc) and oldtable.linenumber=4),'') +
coalesce ((select top 1 sometext from OldTable where OldTable.id=newtable.id and oldtable.rank=(select top 1 rank from oldtable where oldtable.id=newtable.id and oldtable.rank<=newtable.rank order by rank desc) and oldtable.linenumber=5),'')
Run Code Online (Sandbox Code Playgroud)

它还假设最大行数为 5,但情况可能并非如此。如果需要的话,我不介意将行号一直硬编码到最多 20 个,但理想情况下,它能够以不同的方式解释它们。使执行时间低于 20 秒(实际数据)是目标......

Tre*_*Dev 2

这应该可行,我稍后会清理它,这样会更有效率。

DECLARE @Old TABLE ( 
  id         INT, 
  rank       INT, 
  linenumber INT, 
  sometext   VARCHAR(1000)) 
DECLARE @New TABLE ( 
  id           INT, 
  rank         INT, 
  combinedtext VARCHAR(1000)) 


;WITH combinedresults(ctid, id, rank, linenumber, combinedtext) 
     AS (SELECT 0, 
                id, 
                rank, 
                linenumber, 
                CAST (sometext AS VARCHAR(8000)) 
         FROM   @Old o 
         WHERE  NOT EXISTS (SELECT TOP 1 1 
                            FROM   @Old 
                            WHERE  id = o.id 
                                   AND rank = o.rank 
                                   AND linenumber < o.linenumber) 
         UNION ALL 
         SELECT ctid + 1, 
                o.id, 
                o.rank, 
                o.linenumber, 
                ct.combinedtext + o.sometext 
         FROM   @Old o 
                INNER JOIN combinedresults ct 
                  ON ct.id = o.id 
                     AND ct.rank = o.rank 
         WHERE  o.linenumber > ct.linenumber) 

UPDATE n 
SET    combinedtext = ct.combinedtext 
FROM   @New n 
       INNER JOIN (SELECT n.id, 
                          n.rank, 
                          MAX(o.rank) orank 
                   FROM   @new n 
                          INNER JOIN @Old o 
                            ON n.id = o.id 
                               AND o.rank <= n.rank 
                   GROUP  BY n.id, 
                             n.rank) r 
         ON n.id = r.id 
            AND n.rank = r.rank 
       INNER JOIN (SELECT id, 
                          ct.rank, 
                          MAX(ctid) ctid 
                   FROM   combinedresults ct 
                   GROUP  BY ct.id, 
                             ct.rank) r2 
         ON r2.id = r.id 
            AND r2.rank = r.orank 
       INNER JOIN combinedresults ct 
         ON r.id = ct.id 
            AND ct.rank = r.orank 
            AND ct.ctid = r2.ctid 

SELECT * 
FROM   @New 
Run Code Online (Sandbox Code Playgroud)