MySQL删除重复记录但保持最新

Khu*_*ram 26 mysql duplicates

我有独特的idemail领域.电子邮件得到重复.我只想保留所有重复项的一个电子邮件地址,但使用最新的id(最后插入的记录).

我怎样才能做到这一点?

Jos*_*tos 86

想象一下,您的表test包含以下数据:

  select id, email
    from test;

ID                     EMAIL                
---------------------- -------------------- 
1                      aaa                  
2                      bbb                  
3                      ccc                  
4                      bbb                  
5                      ddd                  
6                      eee                  
7                      aaa                  
8                      aaa                  
9                      eee 
Run Code Online (Sandbox Code Playgroud)

因此,我们需要查找所有重复的电子邮件并删除所有这些电子邮件,但最新的ID.
在这种情况下,aaa,bbbeee重复,所以我们要删除ID为1,7,2和6.

要做到这一点,首先我们需要找到所有重复的电子邮件:

      select email 
        from test
       group by email
      having count(*) > 1;

EMAIL                
-------------------- 
aaa                  
bbb                  
eee  
Run Code Online (Sandbox Code Playgroud)

然后,从这个数据集中,我们需要找到每个重复电子邮件的最新ID:

  select max(id) as lastId, email
    from test
   where email in (
              select email 
                from test
               group by email
              having count(*) > 1
       )
   group by email;

LASTID                 EMAIL                
---------------------- -------------------- 
8                      aaa                  
4                      bbb                  
9                      eee                                 
Run Code Online (Sandbox Code Playgroud)

最后,我们现在可以删除ID小于LASTID的所有这些电子邮件.所以解决方案是:

delete test
  from test
 inner join (
  select max(id) as lastId, email
    from test
   where email in (
              select email 
                from test
               group by email
              having count(*) > 1
       )
   group by email
) duplic on duplic.email = test.email
 where test.id < duplic.lastId;
Run Code Online (Sandbox Code Playgroud)

我现在没有在这台机器上安装mySql,但是应该可行

更新

上面的删除工作,但我找到了一个更优化的版本:

 delete test
   from test
  inner join (
     select max(id) as lastId, email
       from test
      group by email
     having count(*) > 1) duplic on duplic.email = test.email
  where test.id < duplic.lastId;
Run Code Online (Sandbox Code Playgroud)

您可以看到它删除了最旧的重复项,即1,7,2,6:

select * from test;
+----+-------+
| id | email |
+----+-------+
|  3 | ccc   |
|  4 | bbb   |
|  5 | ddd   |
|  8 | aaa   |
|  9 | eee   |
+----+-------+
Run Code Online (Sandbox Code Playgroud)

另一个版本是由Rene Limon提供的删除

delete from test
 where id not in (
    select max(id)
      from test
     group by email)
Run Code Online (Sandbox Code Playgroud)

  • 可能是:`DELETE FROM test WHERE id NOT IN(SELECT MAX(id)FROM test GROUP BY email)` (6认同)
  • @HamSam尝试使用嵌套子查询,以便mySql实现它并且不再使用"相同的表",例如,使用`from test from test where not in(SELECT*FROM(通过电子邮件从测试组中选择max(id)) AS S)`(我添加了大写部分) (4认同)
  • 我得到两次指定错误的“表'test',既作为'删除'的目标,又作为数据的单独来源'。 (2认同)

Gau*_*pal 9

正确的方法是

DELETE FROM `tablename` 
  WHERE id NOT IN (
    SELECT * FROM (
      SELECT MAX(id) FROM tablename 
        GROUP BY name
    ) 
  )
Run Code Online (Sandbox Code Playgroud)

  • 结果出现错误 1248 (42000):每个派生表必须有自己的别名,添加名为 DTAB 的别名的工作方式如下所示: DELETE FROM `tablename` WHERE id NOT IN (SELECT * FROM (SELECT MAX(id) FROM tablename GROUP BY)名称)作为 DTAB) (4认同)

Tan*_*ury 6

如果你想保留 id 值最小的行:

DELETE n1 FROM 'yourTableName' n1, 'yourTableName' n2 WHERE n1.id > n2.id AND n1.email = n2.email
Run Code Online (Sandbox Code Playgroud)

如果你想保留具有最高 id 值的行:

DELETE n1 FROM 'yourTableName' n1, 'yourTableName' n2 WHERE n1.id < n2.id AND n1.email = n2.email
Run Code Online (Sandbox Code Playgroud)

或者这个查询也可能有帮助

DELETE FROM `yourTableName` 
  WHERE id NOT IN (
    SELECT * FROM (
      SELECT MAX(id) FROM yourTableName 
        GROUP BY name
    ) 
  )
Run Code Online (Sandbox Code Playgroud)


小智 5

试试这个方法

DELETE t1 FROM test t1, test t2 
WHERE t1.id > t2.id AND t1.email = t2.email
Run Code Online (Sandbox Code Playgroud)

  • 确保您的列已编入索引(该示例中的 ID 和主要是电子邮件)。否则,如果您有数千或数百万个寄存器,则需要几分钟(或几小时)。 (3认同)
  • 不知道为什么它隐藏在页面这么远的地方。简单有效 (2认同)
  • 这真的能保持最新吗?最新的具有最高的“id”,看起来这个查询正在删除任何大于其他“id”的“id”。请参阅@TanvirChowdhury 的回答 /sf/answers/4440381291/ (2认同)