Mac*_*ver 0 t-sql sql-server performance duplicates i2b2
我正在尝试找到一种在SQL Server中查找重复项的更好方法.在SSMS结果窗口中显示结果之前,这需要超过20分钟才能运行,只有超过3亿条记录.在坠毁之前又过了22分钟.
然后SSMS在显示16,777,216条记录后抛出此错误:
An error occurred while executing batch. Error message is: Exception of type 'System.OutOfMemoryException' was thrown.
Run Code Online (Sandbox Code Playgroud)
架构:
ENCOUNTER_NUM - numeric(22,0)
CONCEPT_CD - varchar(50)
PROVIDER_ID - varchar(50)
START_DATE - datetime
MODIFIER_CD - varchar(100)
INSTANCE_NUM - numeric(18,0)
SELECT
ROW_NUMBER() OVER (ORDER BY f1.[ENCOUNTER_NUM],f1.[CONCEPT_CD],f1.[PROVIDER_ID],f1.[START_DATE],f1.[MODIFIER_CD],f1.[INSTANCE_NUM]),
f1.[ENCOUNTER_NUM],
f1.[CONCEPT_CD],
f1.[PROVIDER_ID],
f1.[START_DATE],
f1.[MODIFIER_CD],
f1.[INSTANCE_NUM]
FROM
[dbo].[I2B2_OBSERVATION_FACT] f1
INNER JOIN [dbo].[I2B2_OBSERVATION_FACT] f2 ON
f1.[ENCOUNTER_NUM] = f2.[ENCOUNTER_NUM]
AND f1.[CONCEPT_CD] = f2.[CONCEPT_CD]
AND f1.[PROVIDER_ID] = f2.[PROVIDER_ID]
AND f1.[START_DATE] = f2.[START_DATE]
AND f1.[MODIFIER_CD] = f2.[MODIFIER_CD]
AND f1.[INSTANCE_NUM] = f2.[INSTANCE_NUM]
Run Code Online (Sandbox Code Playgroud)
不知道这是多快多少,但值得一试.
SELECT
COUNT(*) AS Dupes,
f1.[ENCOUNTER_NUM],
f1.[CONCEPT_CD],
f1.[PROVIDER_ID],
f1.[START_DATE],
f1.[MODIFIER_CD],
f1.[INSTANCE_NUM]
FROM
[dbo].[I2B2_OBSERVATION_FACT] f1
GROUP BY
f1.[ENCOUNTER_NUM],
f1.[CONCEPT_CD],
f1.[PROVIDER_ID],
f1.[START_DATE],
f1.[MODIFIER_CD],
f1.[INSTANCE_NUM]
HAVING
COUNT(*) > 1
Run Code Online (Sandbox Code Playgroud)
归档时间: |
|
查看次数: |
156 次 |
最近记录: |