Ada*_*dam 5 performance normalization join sql-server query
我有一个表(称为 oldTable),其中的列如下:
ID (int),Rank (int),TextLineNumber (int),SomeText (varchar)
主键是多部分的:ID+Rank+TextLineNumber。
我正在尝试将它转换/加入另一个表(称为 newTable),其中的列如下:
ID (int)、Rank (int)、CombinedText (varchar)
主键是 ID+Rank。
新表上的 ID 和 Rank 已经填充,但我需要一个查询来更新 newTable 的 CombinedText 列,并考虑以下注意事项:
以下是一些示例数据:
旧 - http://i54.tinypic.com/jq0vmx.png
新 - http://i53.tinypic.com/dhfyn8.png
如果重要的话,我正在使用 MSSql 2005。我目前使用 T-SQL 和 while 循环执行此操作,但它已成为一个严重的性能瓶颈(10000 行大约需要 1 分钟)。
编辑:CSV 中的扩展示例数据:
旧:
ID,Rank,LineNumber,SomeText
1,1,1,the qu
1,1,2,ick br
1,1,3,own
1,2,1,some te
1,2,2,xt
1,3,1,sample
2,7,1,jumped ov
2,7,2,er the
2,7,3,lazy
2,13,1,samp
2,13,2,le text
3,1,1,ABC
3,1,2,DEF
3,1,3,GHI
3,1,4,JKL
3,50,1,XYZ
Run Code Online (Sandbox Code Playgroud)
新的:
ID,Rank,CombinedText
1,2,some text
2,13,sample text
2,14,sample text
3,4,ABCDEFGHIJKL
3,5,ABCDEFGHIJKL
3,50,XYZ
3,55,XYZ
Run Code Online (Sandbox Code Playgroud)
edit2:
这是一个示例查询,我发现它确实有效,但速度不够快(依赖于多个子查询):
update newtable set combinedtext =
coalesce ((select top 1 sometext from OldTable where OldTable.id=newtable.id and oldtable.rank=(select top 1 rank from oldtable where oldtable.id=newtable.id and oldtable.rank<=newtable.rank order by rank desc) and oldtable.linenumber=1),'') +
coalesce ((select top 1 sometext from OldTable where OldTable.id=newtable.id and oldtable.rank=(select top 1 rank from oldtable where oldtable.id=newtable.id and oldtable.rank<=newtable.rank order by rank desc) and oldtable.linenumber=2),'') +
coalesce ((select top 1 sometext from OldTable where OldTable.id=newtable.id and oldtable.rank=(select top 1 rank from oldtable where oldtable.id=newtable.id and oldtable.rank<=newtable.rank order by rank desc) and oldtable.linenumber=3),'') +
coalesce ((select top 1 sometext from OldTable where OldTable.id=newtable.id and oldtable.rank=(select top 1 rank from oldtable where oldtable.id=newtable.id and oldtable.rank<=newtable.rank order by rank desc) and oldtable.linenumber=4),'') +
coalesce ((select top 1 sometext from OldTable where OldTable.id=newtable.id and oldtable.rank=(select top 1 rank from oldtable where oldtable.id=newtable.id and oldtable.rank<=newtable.rank order by rank desc) and oldtable.linenumber=5),'')
Run Code Online (Sandbox Code Playgroud)
它还假设最大行数为 5,但情况可能并非如此。如果需要的话,我不介意将行号一直硬编码到最多 20 个,但理想情况下,它能够以不同的方式解释它们。使执行时间低于 20 秒(实际数据)是目标......
这应该可行,我稍后会清理它,这样会更有效率。
DECLARE @Old TABLE (
id INT,
rank INT,
linenumber INT,
sometext VARCHAR(1000))
DECLARE @New TABLE (
id INT,
rank INT,
combinedtext VARCHAR(1000))
;WITH combinedresults(ctid, id, rank, linenumber, combinedtext)
AS (SELECT 0,
id,
rank,
linenumber,
CAST (sometext AS VARCHAR(8000))
FROM @Old o
WHERE NOT EXISTS (SELECT TOP 1 1
FROM @Old
WHERE id = o.id
AND rank = o.rank
AND linenumber < o.linenumber)
UNION ALL
SELECT ctid + 1,
o.id,
o.rank,
o.linenumber,
ct.combinedtext + o.sometext
FROM @Old o
INNER JOIN combinedresults ct
ON ct.id = o.id
AND ct.rank = o.rank
WHERE o.linenumber > ct.linenumber)
UPDATE n
SET combinedtext = ct.combinedtext
FROM @New n
INNER JOIN (SELECT n.id,
n.rank,
MAX(o.rank) orank
FROM @new n
INNER JOIN @Old o
ON n.id = o.id
AND o.rank <= n.rank
GROUP BY n.id,
n.rank) r
ON n.id = r.id
AND n.rank = r.rank
INNER JOIN (SELECT id,
ct.rank,
MAX(ctid) ctid
FROM combinedresults ct
GROUP BY ct.id,
ct.rank) r2
ON r2.id = r.id
AND r2.rank = r.orank
INNER JOIN combinedresults ct
ON r.id = ct.id
AND ct.rank = r.orank
AND ct.ctid = r2.ctid
SELECT *
FROM @New
Run Code Online (Sandbox Code Playgroud)