我有一个包含 4000 万个条目的数据库,并希望使用以下WHERE
子句运行查询
...
WHERE
`POP1` IS NOT NULL
&& `VT`='ABC'
&& (`SOURCE`='HOME')
&& (`alt` RLIKE '^[AaCcGgTt]$')
&& (`ref` RLIKE '^[AaCcGgTt]$')
&& (`AA` RLIKE '^[AaCcGgTt]$')
&& (`ref` = `AA` || `alt` = `AA`)
LIMIT 10 ;
Run Code Online (Sandbox Code Playgroud)
POP1
是一个浮点列,也可以是 NULL。POP1 IS NOT NULL
应该排除大约 50% 的条目,这就是我把它放在开头的原因。所有其他术语仅略微减少数量。
其中,我设计了一个pop1_vt_source
似乎没有使用的索引,而使用了vt
第一列的索引。解释输出:
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
| 1 | SIMPLE | myTab | ref | vt_source_pop1_pop2,pop1_vt_source,... | vt_source_pop1_pop2 | 206 | const,const | 20040021 | Using where |
Run Code Online (Sandbox Code Playgroud)
为什么pop1
不使用第一列的索引?因为NOT
或因为NULL
一般。如何改进索引和 WHERE 子句的设计?即使限制为 10 个条目,查询也需要超过 30 秒,尽管表中的前 100 个条目应包含 10 个匹配项。
小智 10
它是NOT NULL
:
CREATE TEMPORARY TABLE `myTab` (`notnul` FLOAT, `nul` FLOAT);
INSERT INTO `myTab` VALUES (1, NULL), (1, 2), (1, NULL), (1, 2), (1, NULL), (1, 2), (1, NULL), (1, 2), (1, NULL), (1, 2), (1, NULL), (1, 2);
SELECT * FROM `myTab`;
Run Code Online (Sandbox Code Playgroud)
给出:
+--------+------+
| notnul | nul |
+--------+------+
| 1 | NULL |
| 1 | 2 |
| 1 | NULL |
| 1 | 2 |
| 1 | NULL |
| 1 | 2 |
| 1 | NULL |
| 1 | 2 |
| 1 | NULL |
| 1 | 2 |
| 1 | NULL |
| 1 | 2 |
+--------+------+
Run Code Online (Sandbox Code Playgroud)
创建索引:
CREATE INDEX `notnul_nul` ON `myTab` (`notnul`, `nul`);
CREATE INDEX `nul_notnul` ON `myTab` (`nul`, `notnul`);
SHOW INDEX FROM `myTab`;
Run Code Online (Sandbox Code Playgroud)
给出:
+-------+------------+------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-------+------------+------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| myTab | 1 | notnul_nul | 1 | notnul | A | 12 | NULL | NULL | YES | BTREE | | |
| myTab | 1 | notnul_nul | 2 | nul | A | 12 | NULL | NULL | YES | BTREE | | |
| myTab | 1 | nul_notnul | 1 | nul | A | 12 | NULL | NULL | YES | BTREE | | |
| myTab | 1 | nul_notnul | 2 | notnul | A | 12 | NULL | NULL | YES | BTREE | | |
+-------+------------+------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
Run Code Online (Sandbox Code Playgroud)
现在解释选择。似乎 MySQL 使用索引,即使您使用NOT NULL
:
EXPLAIN SELECT * FROM `myTab` WHERE `notnul` IS NOT NULL;
+----+-------------+-------+-------+---------------+------------+---------+------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+------------+---------+------+------+--------------------------+
| 1 | SIMPLE | myTab | index | notnul_nul | notnul_nul | 10 | NULL | 12 | Using where; Using index |
+----+-------------+-------+-------+---------------+------------+---------+------+------+--------------------------+
EXPLAIN SELECT * FROM `myTab` WHERE `nul` IS NOT NULL;
+----+-------------+-------+-------+---------------+------------+---------+------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+------------+---------+------+------+--------------------------+
| 1 | SIMPLE | myTab | range | nul_notnul | nul_notnul | 5 | NULL | 6 | Using where; Using index |
+----+-------------+-------+-------+---------------+------------+---------+------+------+--------------------------+
Run Code Online (Sandbox Code Playgroud)
但是,当比较NOT NULL
和 时NULL
,似乎 MySQL 在使用NOT NULL
. 虽然这显然没有添加任何信息。这是因为 MySQL 解释NOT NULL
为一个范围,正如您在 type-column 中看到的那样。我不确定是否有解决方法:
EXPLAIN SELECT * FROM `myTab` WHERE `nul` IS NULL && notnul=2;
+----+-------------+-------+------+-----------------------+------------+---------+-------------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+-----------------------+------------+---------+-------------+------+--------------------------+
| 1 | SIMPLE | myTab | ref | notnul_nul,nul_notnul | notnul_nul | 10 | const,const | 1 | Using where; Using index |
+----+-------------+-------+------+-----------------------+------------+---------+-------------+------+--------------------------+
EXPLAIN SELECT * FROM `myTab` WHERE `nul` IS NOT NULL && notnul=2;
+----+-------------+-------+-------+-----------------------+------------+---------+------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+-----------------------+------------+---------+------+------+--------------------------+
| 1 | SIMPLE | myTab | range | notnul_nul,nul_notnul | notnul_nul | 10 | NULL | 1 | Using where; Using index |
+----+-------------+-------+-------+-----------------------+------------+---------+------+------+--------------------------+
Run Code Online (Sandbox Code Playgroud)
我认为在 MySQL 中可能有更好的实现,因为它NULL
是一个特殊的值。可能大多数人都对NOT NULL
价值观感兴趣。