表和数据集之间的合并/交集 - 如何实现?

Jam*_*mes 2 sql-server insert merge

考虑下表:

Id          Hash
----------- ----------------------------------
1           0x31F777F0804D301936411E3ECD760859
2           0xD64A593F3E9ACC972158D522A4289EA0

(Id is an identity column)
Run Code Online (Sandbox Code Playgroud)

在该表中,我想合并以下数据集:

Hash
----------------------------------
0x31F777F0804D301936411E3ECD760859
0x31F777F0804D301936411E3ECD760859
0x0C5A65264F92A543E7AAA06375349C06

(Id is NOT present in the dataset)
Run Code Online (Sandbox Code Playgroud)

合并规则如下:

  • 如果表中不存在哈希,则将其插入表中;
  • 如果该hash在数据集中不存在,则从表中删除;
  • 如果哈希确实存在于双方,并且表中有X个实例,源中有Y个实例,则应将(YX)个实例插入到表中。

合并的结果应该使表格看起来像这样:

Id          Hash
----------- ----------------------------------
1           0x31F777F0804D301936411E3ECD760859
3           0x31F777F0804D301936411E3ECD760859
4           0x0C5A65264F92A543E7AAA06375349C06
Run Code Online (Sandbox Code Playgroud)

编写查询以实现此操作的最有效方法是什么?仅供参考,为简洁起见,省略了其他列。

Pau*_*ite 5

使用示例数据:

DECLARE @T table
(
    Id integer IDENTITY NOT NULL PRIMARY KEY, 
    [Hash] binary(16) NOT NULL INDEX h
);

INSERT @T ([Hash]) VALUES (0x31F777F0804D301936411E3ECD760859);
INSERT @T ([Hash]) VALUES (0xD64A593F3E9ACC972158D522A4289EA0);

DECLARE @S table 
(
    [Hash] binary(16) NOT NULL
);

INSERT @S
    ([Hash])
VALUES
    (0x31F777F0804D301936411E3ECD760859),
    (0x31F777F0804D301936411E3ECD760859),
    (0x0C5A65264F92A543E7AAA06375349C06);
Run Code Online (Sandbox Code Playgroud)

你可以把它写成MERGE

WITH
    T AS
    (
        SELECT
            T.[Hash], 
            rn = ROW_NUMBER() OVER (
                PARTITION BY T.[Hash] 
                ORDER BY T.[Hash], T.Id)
        FROM @T AS T
    ),
    S AS
    (
        SELECT DISTINCT
            S.[Hash],
            rn = ROW_NUMBER() OVER (
                PARTITION BY S.[Hash] 
                ORDER BY S.[Hash])
        FROM @S AS S
    )
MERGE T
USING S
    ON S.[Hash] = T.[Hash]
    AND S.rn = T.rn
WHEN NOT MATCHED BY TARGET THEN INSERT ([Hash]) VALUES (S.[Hash])
WHEN NOT MATCHED BY SOURCE THEN DELETE;
Run Code Online (Sandbox Code Playgroud)

数据库<>小提琴

但是出于性能原因(以及一些错误),我通常会将其写为两个单独的语句:

WITH ToDelete AS
(
    SELECT
        T.*
    FROM @T AS T
    WHERE 
        NOT EXISTS 
        (
            SELECT
                S.* 
            FROM @S AS S 
            WHERE 
                S.[Hash] = T.[Hash]
        )
)
DELETE ToDelete;
Run Code Online (Sandbox Code Playgroud)
WITH ToInsert AS
(
    SELECT
        S.[Hash], 
        rn = ROW_NUMBER() OVER (
            PARTITION BY S.[Hash] 
            ORDER BY S.[Hash])
    FROM @S AS S
    EXCEPT
    SELECT
        T.[Hash], 
        rn = ROW_NUMBER() OVER (
            PARTITION BY T.[Hash] 
            ORDER BY T.[Hash], T.Id)
    FROM @T AS T
)
INSERT @T
    ([Hash])
SELECT
    ToInsert.[Hash]
FROM ToInsert;
Run Code Online (Sandbox Code Playgroud)

数据库<>小提琴

您应该在 上的目标上有一个唯一索引([Hash], [Id])[Hash]很有可能你已经有了这个,或者相当于一个索引和一个唯一的(可能是聚集的)索引[Id]

可能还有其他问题,这取决于问题中为简洁起见省略了什么。无论如何,它应该为您自己的解决方案提供几个可能的起点。