仅在SQL中的列中选择重复值的第一行

RYN*_*RYN 10 sql sqlite ms-access duplicates

我的表有一个列可能在一个突发中具有相同的值.像这样:

+----+---------+
| id |   Col1  | 
+----+---------+
| 1  | 6050000 |
+----+---------+
| 2  | 6050000 |
+----+---------+
| 3  | 6050000 |
+----+---------+
| 4  | 6060000 |
+----+---------+
| 5  | 6060000 |
+----+---------+
| 6  | 6060000 |
+----+---------+
| 7  | 6060000 |
+----+---------+
| 8  | 6060000 |
+----+---------+
| 9  | 6050000 |
+----+---------+
| 10 | 6000000 |
+----+---------+
| 11 | 6000000 |
+----+---------+
Run Code Online (Sandbox Code Playgroud)

现在我想修剪Col1重复值的行,只选择第一次出现的行.
对于上表,结果应为:

+----+---------+
| id |   Col1  | 
+----+---------+
| 1  | 6050000 |
+----+---------+
| 4  | 6060000 |
+----+---------+
| 9  | 6050000 |
+----+---------+
| 10 | 6000000 |
+----+---------+
Run Code Online (Sandbox Code Playgroud)

我怎么能在SQL中这样做?
请注意,只应删除突发行,并且可以在非突发行中重复值! id=1id=9在样本结果中重复.

编辑:
我用它实现了它:

select id,col1 from data as d1
where not exists (
    Select id from data as d2
    where d2.id=d1.id-1 and d1.col1=d2.col1 order by id limit 1)
Run Code Online (Sandbox Code Playgroud)

但这只适用于ID是顺序的.由于ID(删除的)之间存在间隙,查询会中断.我怎样才能解决这个问题?

Erw*_*ter 8

您可以使用EXISTS半联接来标识候选人:

选择想要的行:

SELECT * FROM tbl
WHERE NOT EXISTS (
    SELECT *
    FROM tbl t
    WHERE t.col1 = tbl.col1
    AND t.id = tbl.id - 1
    )
ORDER BY id
Run Code Online (Sandbox Code Playgroud)

摆脱不需要的行:

DELETE FROM tbl
-- SELECT * FROM tbl
WHERE EXISTS (
    SELECT *
    FROM   tbl t
    WHERE  t.col1 = tbl.col1
    AND    t.id   = tbl.id - 1
    )
Run Code Online (Sandbox Code Playgroud)

这有效地删除了前一行具有相同值的每一行,col1从而达到了您设定的目标:只有每个突发的第一行存活.

我离开了注释SELECT声明,因为在执行契约之前,您应该始终检查要删除的内容.


非顺序ID的解决方案:

如果您的RDBMS支持CTE窗口函数(如PostgreSQL,Oracle,SQL Server,...而不是 SQLite,MS Access或MySQL),那么有一种优雅的方式:

WITH x AS (
    SELECT *, row_number() OVER (ORDER BY id) AS rn
    FROM tbl
    )
SELECT id, col1
FROM   x
WHERE NOT EXISTS (
    SELECT *
    FROM   x x1
    WHERE  x1.col1 = x.col1
    AND    x1.rn   = x.rn - 1
    )
ORDER BY id;
Run Code Online (Sandbox Code Playgroud)

还有一种不那么优雅的方式可以在没有这些细节的情况下完成工作.
应该适合你:

SELECT id, col1
FROM   tbl
WHERE (
    SELECT t.col1 = tbl.col1
    FROM   tbl AS t
    WHERE  t.id < tbl.id
    ORDER  BY id DESC
    LIMIT  1) IS NOT TRUE
ORDER BY id
Run Code Online (Sandbox Code Playgroud)

测试套件非顺序ID的工具

(在PostgreSQL中测试过)

CREATE TEMP TABLE tbl (id int, col1 int);
INSERT INTO tbl VALUES
 (1,6050000),(2,6050000),(6,6050000)
,(14,6060000),(15,6060000),(16,6060000)
,(17,6060000),(18,6060000),(19,6050000)
,(20,6000000),(111,6000000);
Run Code Online (Sandbox Code Playgroud)


The*_*kle 5

select min(id), Col1 from tableName group by Col1 
Run Code Online (Sandbox Code Playgroud)