Ale*_*lex 1789 sql duplicates
email
一个字段很容易找到:
SELECT name, COUNT(email)
FROM users
GROUP BY email
HAVING COUNT(email) > 1
Run Code Online (Sandbox Code Playgroud)
所以,如果我们有一张桌子
ID NAME EMAIL
1 John asd@asd.com
2 Sam asd@asd.com
3 Tom asd@asd.com
4 Bob bob@asd.com
5 Tom asd@asd.com
Run Code Online (Sandbox Code Playgroud)
这个查询将给我们John,Sam,Tom,Tom,因为他们都有相同的email
.
但是,我想要的是使用相同的name
和重复name
.
也就是说,我想得到"汤姆","汤姆".
我需要这个的原因:我犯了一个错误,并允许插入重复email
和email
值.现在我需要删除/更改重复项,所以我需要先找到它们.
gbn*_*gbn 2825
SELECT
name, email, COUNT(*)
FROM
users
GROUP BY
name, email
HAVING
COUNT(*) > 1
Run Code Online (Sandbox Code Playgroud)
只需在两个列上分组.
注意:旧的ANSI标准是在GROUP BY中包含所有非聚合列,但这已经改变了"功能依赖"的概念:
在关系数据库理论中,函数依赖性是来自数据库的关系中的两组属性之间的约束.换句话说,函数依赖是描述关系中属性之间关系的约束.
支持不一致:
sql_mode=only_full_group_by
:
KM.*_*KM. 350
试试这个:
declare @YourTable table (id int, name varchar(10), email varchar(50))
INSERT @YourTable VALUES (1,'John','John-email')
INSERT @YourTable VALUES (2,'John','John-email')
INSERT @YourTable VALUES (3,'fred','John-email')
INSERT @YourTable VALUES (4,'fred','fred-email')
INSERT @YourTable VALUES (5,'sam','sam-email')
INSERT @YourTable VALUES (6,'sam','sam-email')
SELECT
name,email, COUNT(*) AS CountOf
FROM @YourTable
GROUP BY name,email
HAVING COUNT(*)>1
Run Code Online (Sandbox Code Playgroud)
OUTPUT:
name email CountOf
---------- ----------- -----------
John John-email 2
sam sam-email 2
(2 row(s) affected)
Run Code Online (Sandbox Code Playgroud)
如果你想要复制的ID使用这个:
SELECT
y.id,y.name,y.email
FROM @YourTable y
INNER JOIN (SELECT
name,email, COUNT(*) AS CountOf
FROM @YourTable
GROUP BY name,email
HAVING COUNT(*)>1
) dt ON y.name=dt.name AND y.email=dt.email
Run Code Online (Sandbox Code Playgroud)
OUTPUT:
id name email
----------- ---------- ------------
1 John John-email
2 John John-email
5 sam sam-email
6 sam sam-email
(4 row(s) affected)
Run Code Online (Sandbox Code Playgroud)
要删除重复项,请尝试:
DELETE d
FROM @YourTable d
INNER JOIN (SELECT
y.id,y.name,y.email,ROW_NUMBER() OVER(PARTITION BY y.name,y.email ORDER BY y.name,y.email,y.id) AS RowRank
FROM @YourTable y
INNER JOIN (SELECT
name,email, COUNT(*) AS CountOf
FROM @YourTable
GROUP BY name,email
HAVING COUNT(*)>1
) dt ON y.name=dt.name AND y.email=dt.email
) dt2 ON d.id=dt2.id
WHERE dt2.RowRank!=1
SELECT * FROM @YourTable
Run Code Online (Sandbox Code Playgroud)
OUTPUT:
id name email
----------- ---------- --------------
1 John John-email
3 fred John-email
4 fred fred-email
5 sam sam-email
(4 row(s) affected)
Run Code Online (Sandbox Code Playgroud)
Chr*_*tal 112
试试这个:
SELECT name, email
FROM users
GROUP BY name, email
HAVING ( COUNT(*) > 1 )
Run Code Online (Sandbox Code Playgroud)
Anc*_*inu 63
如果你想删除重复项,这里有一个更简单的方法,而不是在三重子选择中找到偶数/奇数行:
SELECT id, name, email
FROM users u, users u2
WHERE u.name = u2.name AND u.email = u2.email AND u.id > u2.id
Run Code Online (Sandbox Code Playgroud)
所以要删除:
DELETE FROM users
WHERE id IN (
SELECT id/*, name, email*/
FROM users u, users u2
WHERE u.name = u2.name AND u.email = u2.email AND u.id > u2.id
)
Run Code Online (Sandbox Code Playgroud)
更容易阅读和理解恕我直言
注意:唯一的问题是您必须执行请求,直到没有删除任何行,因为每次只删除每个副本中的一个
小智 42
请尝试以下方法:
SELECT * FROM
(
SELECT Id, Name, Age, Comments, Row_Number() OVER(PARTITION BY Name, Age ORDER By Name)
AS Rank
FROM Customers
) AS B WHERE Rank>1
Run Code Online (Sandbox Code Playgroud)
小智 28
SELECT name, email
FROM users
WHERE email in
(SELECT email FROM users
GROUP BY email
HAVING COUNT(*)>1)
Run Code Online (Sandbox Code Playgroud)
小智 23
派对有点晚了,但我找到了一个很酷的解决方法来找到所有重复的ID:
SELECT GROUP_CONCAT( id )
FROM users
GROUP BY email
HAVING ( COUNT(email) > 1 )
Run Code Online (Sandbox Code Playgroud)
Tan*_*ete 20
试试这段代码
WITH CTE AS
( SELECT Id, Name, Age, Comments, RN = ROW_NUMBER()OVER(PARTITION BY Name,Age ORDER BY ccn)
FROM ccnmaster )
select * from CTE
Run Code Online (Sandbox Code Playgroud)
小智 19
选择最适合的解决方案。
Create table NewTable (id int, name varchar(10), email varchar(50))
INSERT NewTable VALUES (1,'John','asd@asd.com')
INSERT NewTable VALUES (2,'Sam','asd@asd.com')
INSERT NewTable VALUES (3,'Tom','asd@asd.com')
INSERT NewTable VALUES (4,'Bob','bob@asd.com')
INSERT NewTable VALUES (5,'Tom','asd@asd.com')
Run Code Online (Sandbox Code Playgroud)
1. 使用 GROUP BY 子句
SELECT
name, email, COUNT(*) AS Occurence
FROM NewTable
GROUP BY name, email
HAVING COUNT(*) > 1
Run Code Online (Sandbox Code Playgroud)
2. 使用 CTE:
要返回每个重复行的整行,请NewTable
使用公用表表达式 (CTE) 将上述查询的结果与表连接起来:
WITH cte AS (
SELECT
name, email, COUNT(*) occurrences
FROM NewTable
GROUP BY name, email
HAVING COUNT(*) > 1
)
SELECT
t1.Id, t1.name, t1.email
FROM NewTable t1
INNER JOIN cte ON
cte.name = t1.name AND
cte.email = t1.email
ORDER BY
t1.name,
t1.email;
Run Code Online (Sandbox Code Playgroud)
3.使用函数ROW_NUMBER()
WITH cte AS (
SELECT
name, email,
ROW_NUMBER() OVER (
PARTITION BY name,email
ORDER BY name,email) rownum
FROM NewTable t1
)
SELECT
*
FROM cte
WHERE rownum > 1;
Run Code Online (Sandbox Code Playgroud)
ROW_NUMBER()
NewTable
根据name
和列中的值将表的行分配到分区中email
。name
重复的行将在和列中具有重复的值email
,但行号不同Mar*_*ský 18
这将选择/删除除每组重复项中的一条记录之外的所有重复记录.因此,删除会留下所有唯一记录+来自每组重复项的一条记录.
选择重复:
SELECT *
FROM table
WHERE
id NOT IN (
SELECT MIN(id)
FROM table
GROUP BY column1, column2
);
Run Code Online (Sandbox Code Playgroud)
删除重复项:
DELETE FROM table
WHERE
id NOT IN (
SELECT MIN(id)
FROM table
GROUP BY column1, column2
);
Run Code Online (Sandbox Code Playgroud)
注意大量记录,可能会导致性能问题.
小智 16
如果您使用Oracle,这种方式更可取:
create table my_users(id number, name varchar2(100), email varchar2(100));
insert into my_users values (1, 'John', 'asd@asd.com');
insert into my_users values (2, 'Sam', 'asd@asd.com');
insert into my_users values (3, 'Tom', 'asd@asd.com');
insert into my_users values (4, 'Bob', 'bob@asd.com');
insert into my_users values (5, 'Tom', 'asd@asd.com');
commit;
select *
from my_users
where rowid not in (select min(rowid) from my_users group by name, email);
Run Code Online (Sandbox Code Playgroud)
小智 9
如果你想看看你的表中是否有任何重复的行,我使用下面的Query:
create table my_table(id int, name varchar(100), email varchar(100));
insert into my_table values (1, 'shekh', 'shekh@rms.com');
insert into my_table values (1, 'shekh', 'shekh@rms.com');
insert into my_table values (2, 'Aman', 'aman@rms.com');
insert into my_table values (3, 'Tom', 'tom@rms.com');
insert into my_table values (4, 'Raj', 'raj@rms.com');
Select COUNT(1) As Total_Rows from my_table
Select Count(1) As Distinct_Rows from ( Select Distinct * from my_table) abc
Run Code Online (Sandbox Code Playgroud)
这是我想出的容易的事情.它使用公用表表达式(CTE)和分区窗口(我认为这些功能在SQL 2008及更高版本中).
此示例查找名称和dob重复的所有学生.要检查重复的字段是否在OVER子句中.您可以在投影中包含所需的任何其他字段.
with cte (StudentId, Fname, LName, DOB, RowCnt)
as (
SELECT StudentId, FirstName, LastName, DateOfBirth as DOB, SUM(1) OVER (Partition By FirstName, LastName, DateOfBirth) as RowCnt
FROM tblStudent
)
SELECT * from CTE where RowCnt > 1
ORDER BY DOB, LName
Run Code Online (Sandbox Code Playgroud)
select name, email
, case
when ROW_NUMBER () over (partition by name, email order by name) > 1 then 'Yes'
else 'No'
end "duplicated ?"
from users
Run Code Online (Sandbox Code Playgroud)
select id,name,COUNT(*) from India group by Id,Name having COUNT(*)>1
Run Code Online (Sandbox Code Playgroud)
小智 8
我们如何计算重复的值?要么重复2次,要么重复2次.只计算它们,而不是按组计算.
就像
select COUNT(distinct col_01) from Table_01
Run Code Online (Sandbox Code Playgroud)
通过使用CTE,我们也可以找到这样的重复值
with MyCTE
as
(
select Name,EmailId,ROW_NUMBER() over(PARTITION BY EmailId order by id) as Duplicate from [Employees]
)
select * from MyCTE where Duplicate>1
Run Code Online (Sandbox Code Playgroud)
小智 7
select emp.ename, emp.empno, dept.loc
from emp
inner join dept
on dept.deptno=emp.deptno
inner join
(select ename, count(*) from
emp
group by ename, deptno
having count(*) > 1)
t on emp.ename=t.ename order by emp.ename
/
Run Code Online (Sandbox Code Playgroud)
小智 7
SELECT id, COUNT(id) FROM table1 GROUP BY id HAVING COUNT(id)>1;
我认为这可以正常搜索特定列中的重复值.
SELECT * FROM users u where rowid = (select max(rowid) from users u1 where
u.email=u1.email);
Run Code Online (Sandbox Code Playgroud)
小智 6
SELECT name, email,COUNT(email)
FROM users
WHERE email IN (
SELECT email
FROM users
GROUP BY email
HAVING COUNT(email) > 1)
Run Code Online (Sandbox Code Playgroud)
这里最重要的是拥有最快的功能。还应确定重复的索引。自联接是一个不错的选择,但要获得更快的功能,最好先查找具有重复项的行,然后与原始表联接以查找重复行的 id。最后按除 id 之外的任何列排序,使重复的行彼此靠近。
SELECT u.*
FROM users AS u
JOIN (SELECT username, email
FROM users
GROUP BY username, email
HAVING COUNT(*)>1) AS w
ON u.username=w.username AND u.email=w.email
ORDER BY u.email;
Run Code Online (Sandbox Code Playgroud)
如果要查找重复数据(通过一个或多个条件)并选择实际行。
with MYCTE as (
SELECT DuplicateKey1
,DuplicateKey2 --optional
,count(*) X
FROM MyTable
group by DuplicateKey1, DuplicateKey2
having count(*) > 1
)
SELECT E.*
FROM MyTable E
JOIN MYCTE cte
ON E.DuplicateKey1=cte.DuplicateKey1
AND E.DuplicateKey2=cte.DuplicateKey2
ORDER BY E.DuplicateKey1, E.DuplicateKey2, CreatedAt
Run Code Online (Sandbox Code Playgroud)
http://developer.azurewebsites.net/2014/09/better-sql-group-by-find-duplicate-data/
这也应该有用,也许试一试.
Select * from Users a
where EXISTS (Select * from Users b
where ( a.name = b.name
OR a.email = b.email)
and a.ID != b.id)
Run Code Online (Sandbox Code Playgroud)
特别适用于您的情况如果您搜索具有某种前缀或一般更改的重复项,例如邮件中的新域名.那么你可以在这些列上使用replace()