xti*_*ine 217 sql sql-server duplicates
我有一个组织的SQL Server数据库,并且有许多重复的行.我想运行一个select语句来获取所有这些和dupes的数量,但也返回与每个组织关联的id.
声明如下:
SELECT orgName, COUNT(*) AS dupes
FROM organizations
GROUP BY orgName
HAVING (COUNT(*) > 1)
Run Code Online (Sandbox Code Playgroud)
将返回类似的东西
orgName | dupes
ABC Corp | 7
Foo Federation | 5
Widget Company | 2
Run Code Online (Sandbox Code Playgroud)
但我也想抓住他们的身份证.有没有办法做到这一点?也许就像一个
orgName | dupeCount | id
ABC Corp | 1 | 34
ABC Corp | 2 | 5
...
Widget Company | 1 | 10
Widget Company | 2 | 2
Run Code Online (Sandbox Code Playgroud)
原因是还有一个单独的用户表链接到这些组织,我想统一它们(因此删除欺骗,以便用户链接到同一组织而不是欺骗组织).但我想手动分配,所以我不会搞砸任何东西,但我仍然需要一个声明返回所有欺骗组织的ID,以便我可以浏览用户列表.
Red*_*ter 308
select o.orgName, oc.dupeCount, o.id
from organizations o
inner join (
SELECT orgName, COUNT(*) AS dupeCount
FROM organizations
GROUP BY orgName
HAVING COUNT(*) > 1
) oc on o.orgName = oc.orgName
Run Code Online (Sandbox Code Playgroud)
小智 88
您可以运行以下查询并查找重复项max(id)并删除这些行.
SELECT orgName, COUNT(*), Max(ID) AS dupes
FROM organizations
GROUP BY orgName
HAVING (COUNT(*) > 1)
Run Code Online (Sandbox Code Playgroud)
但是你必须运行几次这个查询.
Pau*_*aul 31
你可以这样做:
SELECT
o.id, o.orgName, d.intCount
FROM (
SELECT orgName, COUNT(*) as intCount
FROM organizations
GROUP BY orgName
HAVING COUNT(*) > 1
) AS d
INNER JOIN organizations o ON o.orgName = d.orgName
Run Code Online (Sandbox Code Playgroud)
如果您只想返回可以删除的记录(只留下其中一个),您可以使用:
SELECT
id, orgName
FROM (
SELECT
orgName, id,
ROW_NUMBER() OVER (PARTITION BY orgName ORDER BY id) AS intRow
FROM organizations
) AS d
WHERE intRow != 1
Run Code Online (Sandbox Code Playgroud)
编辑:SQL Server 2000没有ROW_NUMBER()函数.相反,你可以使用:
SELECT
o.id, o.orgName, d.intCount
FROM (
SELECT orgName, COUNT(*) as intCount, MIN(id) AS minId
FROM organizations
GROUP BY orgName
HAVING COUNT(*) > 1
) AS d
INNER JOIN organizations o ON o.orgName = d.orgName
WHERE d.minId != o.id
Run Code Online (Sandbox Code Playgroud)
标记为正确的解决方案对我不起作用,但我发现这个答案非常有用:获取MySql中重复行的列表
SELECT n1.*
FROM myTable n1
INNER JOIN myTable n2
ON n2.repeatedCol = n1.repeatedCol
WHERE n1.id <> n2.id
Run Code Online (Sandbox Code Playgroud)
你可以尝试这个,它最适合你
WITH CTE AS
(
SELECT *,RN=ROW_NUMBER() OVER (PARTITION BY orgName ORDER BY orgName DESC) FROM organizations
)
select * from CTE where RN>1
go
Run Code Online (Sandbox Code Playgroud)
select * from [Employees]
Run Code Online (Sandbox Code Playgroud)

查找重复记录1)使用CTE
with mycte
as
(
select Name,EmailId,ROW_NUMBER() over(partition by Name,EmailId order by id) as Duplicate from [Employees]
)
select * from mycte
Run Code Online (Sandbox Code Playgroud)

2)通过使用GroupBy
select Name,EmailId,COUNT(name) as Duplicate from [Employees] group by Name,EmailId
Run Code Online (Sandbox Code Playgroud)
如果要删除重复项:
WITH CTE AS(
SELECT orgName,id,
RN = ROW_NUMBER()OVER(PARTITION BY orgName ORDER BY Id)
FROM organizations
)
DELETE FROM CTE WHERE RN > 1
Run Code Online (Sandbox Code Playgroud)