hgu*_*yan 9 sql-server duplicates sql-server-2008
我正在使用SQL Server 2008.我有一张桌子
Customers
customer_number int
field1 varchar
field2 varchar
field3 varchar
field4 varchar
Run Code Online (Sandbox Code Playgroud)
......以及更多列,对我的查询无关紧要.
列customer_number是pk.我试图找到重复的值和它们之间的一些差异.
请帮我查找所有相同的行
1) field1,field2,field3,field4
2)只有3列相等而其中一列不相同(列表1中的行除外)
3)只有2列相等而其中两列不相等(列表1和列表2中的行除外)
最后,我将有3个表,其中包含此结果和其他groupId,对于一组相似的组,它们将是相同的(例如,对于3列等于,具有3个相同列的行将是一个单独的组)
谢谢.
Bal*_*dar 56
这是一个方便的查询,用于查找表中的重复项.假设您要查找表中存在多个的所有电子邮件地址:
SELECT email, COUNT(email) AS NumOccurrences
FROM users
GROUP BY email
HAVING ( COUNT(email) > 1 )
Run Code Online (Sandbox Code Playgroud)
您还可以使用此技术查找仅出现一次的行:
SELECT email
FROM users
GROUP BY email
HAVING ( COUNT(email) = 1 )
Run Code Online (Sandbox Code Playgroud)
最简单的可能是编写一个存储过程来迭代每组具有重复项的客户,并分别为每个组编号插入匹配的客户。
但是,我已经考虑过,您可能可以使用子查询来完成此操作。希望我没有让它变得比应有的更复杂,但这应该可以让您找到第一个重复项表(所有四个字段)所需的内容。请注意,这尚未经过测试,因此可能需要进行一些调整。
基本上,它获取每组有重复的字段,每个字段都有一个组编号,然后获取具有这些字段的所有客户并分配相同的组编号。
INSERT INTO FourFieldsDuplicates(group_no, customer_no)
SELECT Groups.group_no, custs.customer_no
FROM (SELECT ROW_NUMBER() OVER(ORDER BY c.field1) AS group_no,
c.field1, c.field2, c.field3, c.field4
FROM Customers c
GROUP BY c.field1, c.field2, c.field3, c.field4
HAVING COUNT(*) > 1) Groups
INNER JOIN Customers custs ON custs.field1 = Groups.field1
AND custs.field2 = Groups.field2
AND custs.field3 = Groups.field3
AND custs.field4 = Groups.field4
Run Code Online (Sandbox Code Playgroud)
其他的有点复杂,但是您需要扩展可能性。三场组将是:
INSERT INTO ThreeFieldsDuplicates(group_no, customer_no)
SELECT Groups.group_no, custs.customer_no
FROM (SELECT ROW_NUMBER() OVER(ORDER BY GroupsInner.field1) AS group_no,
GroupsInner.field1, GroupsInner.field2,
GroupsInner.field3, GroupsInner.field4
FROM (SELECT c.field1, c.field2, c.field3, NULL AS field4
FROM Customers c
WHERE NOT EXISTS(SELECT d.customer_no
FROM FourFieldsDuplicates d
WHERE d.customer_no = c.customer_no)
GROUP BY c.field1, c.field2, c.field3
UNION ALL
SELECT c.field1, c.field2, NULL AS field3, c.field4
FROM Customers c
WHERE NOT EXISTS(SELECT d.customer_no
FROM FourFieldsDuplicates d
WHERE d.customer_no = c.customer_no)
GROUP BY c.field1, c.field2, c.field4
UNION ALL
SELECT c.field1, NULL AS field2, c.field3, c.field4
FROM Customers c
WHERE NOT EXISTS(SELECT d.customer_no
FROM FourFieldsDuplicates d
WHERE d.customer_no = c.customer_no)
GROUP BY c.field1, c.field3, c.field4
UNION ALL
SELECT NULL AS field1, c.field2, c.field3, c.field4
FROM Customers c
WHERE NOT EXISTS(SELECT d.customer_no
FROM FourFieldsDuplicates d
WHERE d.customer_no = c.customer_no)
GROUP BY c.field2, c.field3, c.field4) GroupsInner
GROUP BY GroupsInner.field1, GroupsInner.field2,
GroupsInner.field3, GroupsInner.field4
HAVING COUNT(*) > 1) Groups
INNER JOIN Customers custs ON (Groups.field1 IS NULL OR custs.field1 = Groups.field1)
AND (Groups.field2 IS NULL OR custs.field2 = Groups.field2)
AND (Groups.field3 IS NULL OR custs.field3 = Groups.field3)
AND (Groups.field4 IS NULL OR custs.field4 = Groups.field4)
Run Code Online (Sandbox Code Playgroud)
希望这能产生正确的结果,我将把最后一个作为练习。:-D