Emm*_*myS 111 mysql sql gaps-and-islands
我们有一个数据库,其表格的值是从另一个系统导入的.有一个自动增量列,没有重复值,但缺少值.例如,运行此查询:
select count(id) from arrc_vouchers where id between 1 and 100
Run Code Online (Sandbox Code Playgroud)
应该返回100,但它返回87.有没有我可以运行的查询将返回缺失数字的值?例如,id为1-70和83-100的记录可能存在,但没有id为71-82的记录.我想返回71,72,73等
这可能吗?
mat*_*att 164
ConfexianMJS 在性能方面提供了更好的 答案.
这里的版本适用于任何大小的表(不仅仅是100行):
SELECT (t1.id + 1) as gap_starts_at,
(SELECT MIN(t3.id) -1 FROM arrc_vouchers t3 WHERE t3.id > t1.id) as gap_ends_at
FROM arrc_vouchers t1
WHERE NOT EXISTS (SELECT t2.id FROM arrc_vouchers t2 WHERE t2.id = t1.id + 1)
HAVING gap_ends_at IS NOT NULL
Run Code Online (Sandbox Code Playgroud)
gap_starts_at
- 当前差距中的第一个idgap_ends_at
- 当前差距中的最后一个ID 小智 83
这对我来说只是找到一个超过80k行的表中的空白:
SELECT
CONCAT(z.expected, IF(z.got-1>z.expected, CONCAT(' thru ',z.got-1), '')) AS missing
FROM (
SELECT
@rownum:=@rownum+1 AS expected,
IF(@rownum=YourCol, 0, @rownum:=YourCol) AS got
FROM
(SELECT @rownum:=0) AS a
JOIN YourTable
ORDER BY YourCol
) AS z
WHERE z.got!=0;
Run Code Online (Sandbox Code Playgroud)
结果:
+------------------+
| missing |
+------------------+
| 1 thru 99 |
| 666 thru 667 |
| 50000 |
| 66419 thru 66456 |
+------------------+
4 rows in set (0.06 sec)
Run Code Online (Sandbox Code Playgroud)
请注意,列的顺序expected
和got
是至关重要的.
如果您知道YourCol
不是从1开始并且无关紧要,则可以替换
(SELECT @rownum:=0) AS a
Run Code Online (Sandbox Code Playgroud)
同
(SELECT @rownum:=(SELECT MIN(YourCol)-1 FROM YourTable)) AS a
Run Code Online (Sandbox Code Playgroud)
新结果:
+------------------+
| missing |
+------------------+
| 666 thru 667 |
| 50000 |
| 66419 thru 66456 |
+------------------+
3 rows in set (0.06 sec)
Run Code Online (Sandbox Code Playgroud)
如果您需要对缺少的ID执行某种shell脚本任务,您还可以使用此变体来直接生成可以在bash中迭代的表达式.
SELECT GROUP_CONCAT(IF(z.got-1>z.expected, CONCAT('$(',z.expected,' ',z.got-1,')'), z.expected) SEPARATOR " ") AS missing
FROM ( SELECT @rownum:=@rownum+1 AS expected, IF(@rownum=height, 0, @rownum:=height) AS got FROM (SELECT @rownum:=0) AS a JOIN block ORDER BY height ) AS z WHERE z.got!=0;
Run Code Online (Sandbox Code Playgroud)
FROM(SELECT @rownum:= @ rownum + 1 AS预期,IF(@ rownum = height,0,@ rownum:= height)AS get FROM(SELECT @rownum:= 0)AS JOIN块ORDER BY height)AS z在哪里z.got!= 0;
这会产生类似的输出
$(seq 1 99) $(seq 666 667) 50000 $(seq 66419 66456)
Run Code Online (Sandbox Code Playgroud)
然后,您可以将其复制并粘贴到bash终端中的for循环中,以便为每个ID执行命令
for ID in $(seq 1 99) $(seq 666 667) 50000 $(seq 66419 66456); do
echo $ID
# fill the gaps
done
Run Code Online (Sandbox Code Playgroud)
它与上面的内容相同,只是它既可读又可执行.通过更改上面的"CONCAT"命令,可以为其他编程语言生成语法.或者甚至可能是SQL.
快速和肮脏的查询,应该做的伎俩:
SELECT a AS id, b AS next_id, (b - a) -1 AS missing_inbetween
FROM
(
SELECT a1.id AS a , MIN(a2.id) AS b
FROM arrc_vouchers AS a1
LEFT JOIN arrc_vouchers AS a2 ON a2.id > a1.id
WHERE a1.id <= 100
GROUP BY a1.id
) AS tab
WHERE
b > a + 1
Run Code Online (Sandbox Code Playgroud)
这将为您提供一个表格,显示其上方缺少ID的ID,以及存在的next_id,以及...之间缺少的ID
id next_id missing_inbetween 1 4 2 68 70 1 75 87 11
如果您使用,MariaDB
则可以使用序列存储引擎更快(800%)的选择:
SELECT * FROM seq_1_to_50000 WHERE SEQ NOT IN (SELECT COL FROM TABLE);
Run Code Online (Sandbox Code Playgroud)
归档时间: |
|
查看次数: |
59312 次 |
最近记录: |