MSSQL:删除重复项的过程

see*_*ker -2 sql t-sql sql-server stored-procedures distinct

考虑没有任何主键或外键的表。我想编写一个程序,它将删除给定表名的所有重复行。

如果所有字段都相同,则该行应被视为其他行的重复。

如果可以的话,你能建议我吗。我尝试过的一件事是按每个领域分组,但是这种方法并不通用。

Luk*_*zda 5

您可以使用Dynamic-SQL实现它

快速支持的解决方案(有很大的改进空间):

CREATE TABLE tab1(a INT, b INT);
INSERT INTO tab1(a,b) VALUES (1,1),(1,1),(1,1),(2,3);
GO
Run Code Online (Sandbox Code Playgroud)

程序:

CREATE PROCEDURE dbo.remove_duplicates
    @tab_name SYSNAME
    ,@debug BIT = 0
AS
BEGIN
    SET NOCOUNT ON;
    -- TODO: validation if table does not exist, raise error
    -- TODO: Add @schema parameter
    -- TODO: Wrap with BEGIN TRY, omit calculated columns, CAST `TEXT/IMAGE/BINARY`....

    DECLARE @sql NVARCHAR(MAX) = 
       'WITH cte AS
        (
            SELECT *, rn = ROW_NUMBER() OVER(PARTITION BY <cols> ORDER BY (SELECT 1))
            FROM <tab_placeholder>
        )
        DELETE FROM cte
        WHERE rn <> 1;';

    DECLARE @cols NVARCHAR(MAX) = STUFF((SELECT ',' +  column_name
                                         FROM INFORMATION_SCHEMA.COLUMNS
                                         WHERE TABLE_NAME = @tab_name
                                           AND TABLE_SCHEMA = 'dbo'
                                         FOR XML PATH('')), 1, 1, '');    

    SET @sql = REPLACE(@sql, '<tab_placeholder>', QUOTENAME(@tab_name));
    SET @sql = REPLACE(@sql, '<cols>', @cols);

    IF @debug = 1 SELECT @sql;

    EXEC dbo.sp_executesql @sql;    

END
GO
Run Code Online (Sandbox Code Playgroud)

执行:

EXEC [dbo].[remove_duplicates] @tab_name = 'tab1', @debug = 1;
SELECT * FROM tab1;
Run Code Online (Sandbox Code Playgroud)

LiveDemo