删除所有重复项

Gar*_*hl 8 mysql delete duplication

我正在尝试删除所有重复项,但仅保留单个记录(更短的 ID)。以下查询删除重复项,但需要大量迭代才能删除所有副本并保留原始副本。

DELETE FROM emailTable WHERE id IN (
 SELECT * FROM (
    SELECT id FROM emailTable GROUP BY email HAVING ( COUNT(email) > 1 )
 ) AS q
)
Run Code Online (Sandbox Code Playgroud)

它的 MySQL。

数据线

CREATE TABLE `emailTable` (
 `id` mediumint(9) NOT NULL auto_increment,
 `email` varchar(200) NOT NULL default '',
 PRIMARY KEY  (`id`)
) ENGINE=MyISAM AUTO_INCREMENT=298872 DEFAULT CHARSET=latin1
Run Code Online (Sandbox Code Playgroud)

Der*_*ney 8

尝试这个:

DELETE FROM emailTable WHERE NOT EXISTS (
 SELECT * FROM (
    SELECT MIN(id) minID FROM emailTable    
    GROUP BY email HAVING COUNT(*) > 0
  ) AS q
  WHERE minID=id
)
Run Code Online (Sandbox Code Playgroud)

以上适用于我对 50 封电子邮件的测试(5 封不同的电子邮件重复了 10 次)。

您可能需要在“电子邮件”列上添加索引:

ALTER TABLE emailTable ADD INDEX ind_email (email);
Run Code Online (Sandbox Code Playgroud)

250,000 行可能有点慢。在一个有 150 万行(正确索引)的表上对我来说很慢,这就是我想出这个策略的方式:

/* CREATE MEMORY TABLE TO HOUSE IDs of the MIN */
CREATE TABLE email_min (minID INT, PRIMARY KEY(minID)) ENGINE=Memory;

/* INSERT THE MINIMUM IDs */
INSERT INTO email_min SELECT id FROM email
    GROUP BY email HAVING MIN(id);

/* MAKE SURE YOU HAVE RIGHT INFO */
SELECT * FROM email 
 WHERE NOT EXISTS (SELECT * FROM email_min WHERE minID=id)

/* DELETE FROM EMAIL */
DELETE FROM email 
 WHERE NOT EXISTS (SELECT * FROM email_min WHERE minID=id)

/* IF ALL IS WELL, DROP MEMORY TABLE */
DROP TABLE email_min;
Run Code Online (Sandbox Code Playgroud)

内存表的好处是使用了一个索引(minID 上的主键),它比普通临时表加快了进程。