仅使用MySQL查询删除重复项?

Jim*_*Jim 14 mysql sql

我有一个包含以下列的表:

URL_ID    
URL_ADDR    
URL_Time
Run Code Online (Sandbox Code Playgroud)

我想URL_ADDR使用MySQL查询删除列上的重复项.

不使用任何编程就可以做这样的事吗?

Dan*_*llo 31

考虑以下测试用例:

CREATE TABLE mytb (url_id int, url_addr varchar(100));

INSERT INTO mytb VALUES (1, 'www.google.com');
INSERT INTO mytb VALUES (2, 'www.microsoft.com');
INSERT INTO mytb VALUES (3, 'www.apple.com');
INSERT INTO mytb VALUES (4, 'www.google.com');
INSERT INTO mytb VALUES (5, 'www.cnn.com');
INSERT INTO mytb VALUES (6, 'www.apple.com');
Run Code Online (Sandbox Code Playgroud)

我们的测试表现在包含:

SELECT * FROM mytb;
+--------+-------------------+
| url_id | url_addr          |
+--------+-------------------+
|      1 | www.google.com    |
|      2 | www.microsoft.com |
|      3 | www.apple.com     |
|      4 | www.google.com    |
|      5 | www.cnn.com       |
|      6 | www.apple.com     |
+--------+-------------------+
5 rows in set (0.00 sec)
Run Code Online (Sandbox Code Playgroud)

然后我们可以使用多表DELETE语法如下:

DELETE t2
FROM   mytb t1
JOIN   mytb t2 ON (t2.url_addr = t1.url_addr AND t2.url_id > t1.url_id);
Run Code Online (Sandbox Code Playgroud)

...将删除重复的条目,只留下第一个网址url_id:

SELECT * FROM mytb;
+--------+-------------------+
| url_id | url_addr          |
+--------+-------------------+
|      1 | www.google.com    |
|      2 | www.microsoft.com |
|      3 | www.apple.com     |
|      5 | www.cnn.com       |
+--------+-------------------+
3 rows in set (0.00 sec)
Run Code Online (Sandbox Code Playgroud)

更新 - 继上述新评论:

如果重复的URL格式不同,您可能需要应用REPLACE()要删除的功能www.http://部分.例如:

DELETE t2
FROM   mytb t1
JOIN   mytb t2 ON (REPLACE(t2.url_addr, 'www.', '') = 
                   REPLACE(t1.url_addr, 'www.', '') AND 
                   t2.url_id > t1.url_id);
Run Code Online (Sandbox Code Playgroud)


Box*_*Box 8

您可能想尝试http://labs.creativecommons.org/2010/01/12/removing-duplicate-rows-in-mysql/中提到的方法.

ALTER IGNORE TABLE your_table ADD UNIQUE INDEX `tmp_index` (URL_ADDR);
Run Code Online (Sandbox Code Playgroud)


Mar*_*ith 5

这将留下具有最​​高特点URL_ID的那些URL_ADDR

DELETE FROM table
WHERE URL_ID NOT IN 
    (SELECT ID FROM 
       (SELECT MAX(URL_ID) AS ID 
        FROM table 
        WHERE URL_ID IS NOT NULL
        GROUP BY URL_ADDR ) X)   /*Sounds like you would need to GROUP BY a 
                                   calculated form - e.g. using REPLACE to 
                                  strip out www see Daniel's answer*/
Run Code Online (Sandbox Code Playgroud)

(派生表'X'是为了避免错误 "你无法为FROM子句中的更新指定目标表'tablename'")


Dou*_*oug 2

您可以对 URL_ADDR 进行分组,这将有效地在 URL_ADDR 字段中仅提供不同的值。

select 
 URL_ID
 URL_ADDR
 URL_Time
from
 some_table
group by
 URL_ADDR
Run Code Online (Sandbox Code Playgroud)

享受!