如何在SQL Server中查找重复值

hgu*_*yan 9 sql-server duplicates sql-server-2008

我正在使用SQL Server 2008.我有一张桌子

Customers

customer_number int

field1 varchar

field2 varchar

field3 varchar

field4 varchar
Run Code Online (Sandbox Code Playgroud)

......以及更多列,对我的查询无关紧要.

customer_number是pk.我试图找到重复的值和它们之间的一些差异.

请帮我查找所有相同的行

1) field1,field2,field3,field4

2)只有3列相等而其中一列不相同(列表1中的行除外)

3)只有2列相等而其中两列不相等(列表1和列表2中的行除外)

最后,我将有3个表,其中包含此结果和其他groupId,对于一组相似的组,它们将是相同的(例如,对于3列等于,具有3个相同列的行将是一个单独的组)

谢谢.

Bal*_*dar 56

这是一个方便的查询,用于查找表中的重复项.假设您要查找表中存在多个的所有电子邮件地址:

SELECT email, COUNT(email) AS NumOccurrences
FROM users
GROUP BY email
HAVING ( COUNT(email) > 1 )
Run Code Online (Sandbox Code Playgroud)

您还可以使用此技术查找仅出现一次的行:

SELECT email
FROM users
GROUP BY email
HAVING ( COUNT(email) = 1 )
Run Code Online (Sandbox Code Playgroud)

  • 简单,美丽的答案.我本可以想到这一点,但我问谷歌,因为我不想想.我没有失望.这是真正的答案. (3认同)

lc.*_*lc. 4

最简单的可能是编写一个存储过程来迭代每组具有重复项的客户,并分别为每个组编号插入匹配的客户。

但是,我已经考虑过,您可能可以使用子查询来完成此操作。希望我没有让它变得比应有的更复杂,但这应该可以让您找到第一个重复项表(所有四个字段)所需的内容。请注意,这尚未经过测试,因此可能需要进行一些调整。

基本上,它获取每组有重复的字段,每个字段都有一个组编号,然后获取具有这些字段的所有客户并分配相同的组编号。

INSERT INTO FourFieldsDuplicates(group_no, customer_no)
SELECT Groups.group_no, custs.customer_no
FROM (SELECT ROW_NUMBER() OVER(ORDER BY c.field1) AS group_no,
             c.field1, c.field2, c.field3, c.field4
      FROM Customers c
      GROUP BY c.field1, c.field2, c.field3, c.field4
      HAVING COUNT(*) > 1) Groups
INNER JOIN Customers custs ON custs.field1 = Groups.field1
                           AND custs.field2 = Groups.field2
                           AND custs.field3 = Groups.field3
                           AND custs.field4 = Groups.field4
Run Code Online (Sandbox Code Playgroud)

其他的有点复杂,但是您需要扩展可能性。三场组将是:

INSERT INTO ThreeFieldsDuplicates(group_no, customer_no)
SELECT Groups.group_no, custs.customer_no
FROM (SELECT ROW_NUMBER() OVER(ORDER BY GroupsInner.field1) AS group_no,
             GroupsInner.field1, GroupsInner.field2, 
             GroupsInner.field3, GroupsInner.field4
      FROM (SELECT c.field1, c.field2, c.field3, NULL AS field4
            FROM Customers c
            WHERE NOT EXISTS(SELECT d.customer_no
                       FROM FourFieldsDuplicates d
                       WHERE d.customer_no = c.customer_no)
            GROUP BY c.field1, c.field2, c.field3
            UNION ALL
            SELECT c.field1, c.field2, NULL AS field3, c.field4
            FROM Customers c
            WHERE NOT EXISTS(SELECT d.customer_no
                             FROM FourFieldsDuplicates d
                             WHERE d.customer_no = c.customer_no)
            GROUP BY c.field1, c.field2, c.field4
            UNION ALL
            SELECT c.field1, NULL AS field2, c.field3, c.field4
            FROM Customers c
            WHERE NOT EXISTS(SELECT d.customer_no
                             FROM FourFieldsDuplicates d
                             WHERE d.customer_no = c.customer_no)
            GROUP BY c.field1, c.field3, c.field4
            UNION ALL
            SELECT NULL AS field1, c.field2, c.field3, c.field4
            FROM Customers c
            WHERE NOT EXISTS(SELECT d.customer_no
                             FROM FourFieldsDuplicates d
                             WHERE d.customer_no = c.customer_no)
            GROUP BY c.field2, c.field3, c.field4) GroupsInner
      GROUP BY GroupsInner.field1, GroupsInner.field2, 
               GroupsInner.field3, GroupsInner.field4
      HAVING COUNT(*) > 1) Groups
INNER JOIN Customers custs ON (Groups.field1 IS NULL OR custs.field1 = Groups.field1)
                           AND (Groups.field2 IS NULL OR custs.field2 = Groups.field2)
                           AND (Groups.field3 IS NULL OR custs.field3 = Groups.field3)
                           AND (Groups.field4 IS NULL OR custs.field4 = Groups.field4)
Run Code Online (Sandbox Code Playgroud)

希望这能产生正确的结果,我将把最后一个作为练习。:-D