我必须清理一个有重复行的表:
id: serial id
gid: group id
url: string <- this is the column that I have to cleanup
Run Code Online (Sandbox Code Playgroud)
一个gid可能有多个url值:
id gid url
---- ---- ------------
1 12 www.gmail.com
2 12 www.some.com
3 12 www.some.com <-- duplicate
4 13 www.other.com
5 13 www.milfsome.com <-- not a duplicate
Run Code Online (Sandbox Code Playgroud)
我想对整个表执行一个查询,并删除所有gid和url重复的行.在上面的示例中,删除后,我想只剩下1,2,4和5.
Aar*_*and 13
;WITH x AS
(
SELECT id, gid, url, rn = ROW_NUMBER() OVER
(PARTITION BY gid, url ORDER BY id)
FROM dbo.table
)
SELECT id,gid,url FROM x WHERE rn = 1 -- the rows you'll keep
-- SELECT id,gid,url FROM x WHERE rn > 1 -- the rows you'll delete
-- DELETE x WHERE rn > 1; -- do the delete
Run Code Online (Sandbox Code Playgroud)
一旦您对第一个选择感到满意,这表示您将保留的行,请将其删除并取消注释第二个选择.一旦你对它感到满意,这表示你将删除的行,删除它并取消注释删除.
如果您不想删除数据,只需忽略SELECT... 下的注释行.