Gar*_*hl 8 mysql delete duplication
我正在尝试删除所有重复项,但仅保留单个记录(更短的 ID)。以下查询删除重复项,但需要大量迭代才能删除所有副本并保留原始副本。
DELETE FROM emailTable WHERE id IN (
SELECT * FROM (
SELECT id FROM emailTable GROUP BY email HAVING ( COUNT(email) > 1 )
) AS q
)
Run Code Online (Sandbox Code Playgroud)
它的 MySQL。
数据线
CREATE TABLE `emailTable` (
`id` mediumint(9) NOT NULL auto_increment,
`email` varchar(200) NOT NULL default '',
PRIMARY KEY (`id`)
) ENGINE=MyISAM AUTO_INCREMENT=298872 DEFAULT CHARSET=latin1
Run Code Online (Sandbox Code Playgroud)
尝试这个:
DELETE FROM emailTable WHERE NOT EXISTS (
SELECT * FROM (
SELECT MIN(id) minID FROM emailTable
GROUP BY email HAVING COUNT(*) > 0
) AS q
WHERE minID=id
)
Run Code Online (Sandbox Code Playgroud)
以上适用于我对 50 封电子邮件的测试(5 封不同的电子邮件重复了 10 次)。
您可能需要在“电子邮件”列上添加索引:
ALTER TABLE emailTable ADD INDEX ind_email (email);
Run Code Online (Sandbox Code Playgroud)
250,000 行可能有点慢。在一个有 150 万行(正确索引)的表上对我来说很慢,这就是我想出这个策略的方式:
/* CREATE MEMORY TABLE TO HOUSE IDs of the MIN */
CREATE TABLE email_min (minID INT, PRIMARY KEY(minID)) ENGINE=Memory;
/* INSERT THE MINIMUM IDs */
INSERT INTO email_min SELECT id FROM email
GROUP BY email HAVING MIN(id);
/* MAKE SURE YOU HAVE RIGHT INFO */
SELECT * FROM email
WHERE NOT EXISTS (SELECT * FROM email_min WHERE minID=id)
/* DELETE FROM EMAIL */
DELETE FROM email
WHERE NOT EXISTS (SELECT * FROM email_min WHERE minID=id)
/* IF ALL IS WELL, DROP MEMORY TABLE */
DROP TABLE email_min;
Run Code Online (Sandbox Code Playgroud)
内存表的好处是使用了一个索引(minID 上的主键),它比普通临时表加快了进程。