SQL:有效地将增量数附加到字符串,避免重复

Sta*_*opo 3 sql sql-server performance

我有一组记录(表[#tmp_origin])在字符串字段中包含重复的条目([Names]).我想将[#tmp_origin]的全部内容插入到目标表[#tmp_destination]中,该表不允许重复,并且可能已包含项目.

如果目标表中不存在源表中的字符串,则只需将in插入目标表中即可.如果目标表中的条目已存在且原始表中的条目值相同,则在将字符串插入目标表之前,必须将字符串附加的增量编号附加到该字符串.

在此示例脚本中,使用游标实现了以这种方式移动数据的过程:


-- create initial situation (origin and destination table, both containing items) - Begin

    CREATE TABLE [#tmp_origin] ([Names] VARCHAR(10))
    CREATE TABLE [#tmp_destination] ([Names] VARCHAR(10))
    CREATE UNIQUE INDEX [IX_UniqueName] ON [#tmp_destination]([Names] ASC)



    INSERT INTO [#tmp_origin]([Names]) VALUES ('a')
    INSERT INTO [#tmp_origin]([Names]) VALUES ('a')
    INSERT INTO [#tmp_origin]([Names]) VALUES ('b')
    INSERT INTO [#tmp_origin]([Names]) VALUES ('c')


    INSERT INTO [#tmp_destination]([Names]) VALUES ('a')
    INSERT INTO [#tmp_destination]([Names]) VALUES ('a_1')
    INSERT INTO [#tmp_destination]([Names]) VALUES ('b')

-- create initial situation - End

    DECLARE @Name VARCHAR(10)

    DECLARE NamesCursor CURSOR LOCAL FORWARD_ONLY FAST_FORWARD READ_ONLY FOR
        SELECT [Names]
        FROM [#tmp_origin];
    OPEN NamesCursor;
    FETCH NEXT FROM NamesCursor INTO @Name;

    WHILE @@FETCH_STATUS = 0
    BEGIN
        DECLARE @finalName VARCHAR(10)
        SET @finalName = @Name
        DECLARE @counter INT
        SET @counter = 1

        WHILE(1=1)
        BEGIN
            IF NOT EXISTS(SELECT * FROM [#tmp_destination] WHERE [Names] = @finalName)
                BREAK;

            SET @finalName = @Name + '_' + CAST(@counter AS VARCHAR)
            SET @counter = @counter + 1
        END
        INSERT INTO [#tmp_destination] ([Names]) (
            SELECT @finalName
        )

        FETCH NEXT FROM NamesCursor INTO @Name;
    END

    CLOSE NamesCursor;
    DEALLOCATE NamesCursor;




    SELECT *
    FROM [#tmp_destination]

    /*
    Expected result:
    a
    a_1
    a_2
    a_3
    b
    b_1
    c
    */

    DROP TABLE [#tmp_origin]
    DROP TABLE [#tmp_destination]


这样可以正常工作,但是当要插入的项目数量增加时,其性能会大幅降低.

有什么想加快它吗?

谢谢

Ric*_*ard 5

使用窗口函数可以对重复项进行编号.您还可以从目标表中获取计数(需要条件去除您添加的后缀):

select orig.names,
       row_number() over (partition by orig.names order by orig.names) as rowNo,
       dest.count
from ##tmp_origin orig
  cross apply (select count(1) from #tmp_destination where names = orig.names) as dest
Run Code Online (Sandbox Code Playgroud)

insert可以从上面构建一个(新后缀rowNo + dest.count -1大于零).

建议您重构目标临时表以将名称和后缀包含在单独的列中 - 这可能意味着有一个新的中间阶段 - 因为这将使匹配逻辑更加简单.