表有1 500 000条记录,其中1 250 000条记录为field ='z'.
我需要选择随机而不是'z'字段.
$random = mt_rand(1, 250000);
$query = "SELECT field FROM table WHERE field != 'z' LIMIT $random, 1";
Run Code Online (Sandbox Code Playgroud)
它工作正常.
然后我决定优化它并field在表格中编入索引.
结果很奇怪 - 它慢了~3次.我测试了它.
它为什么慢?是不是这样的索引应该让它更快?
我的ISAM
explain with index:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE table range field field 758 NULL 1139287 Using
explain without index:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE table ALL NULL NULL NULL NULL 1484672 Using where
Run Code Online (Sandbox Code Playgroud)
Cra*_*der 21
摘要
field由于b树的性质,问题在于索引不是一个好的候选者.
说明
假设你有一张表有500,000个硬币投掷的结果,其中投掷是1(头)或0(尾):
CREATE TABLE toss (
id int NOT NULL AUTO_INCREMENT,
result int NOT NULL DEFAULT '0',
PRIMARY KEY ( id )
)
select result, count(*) from toss group by result order by result;
+--------+----------+
| result | count(*) |
+--------+----------+
| 0 | 250290 |
| 1 | 249710 |
+--------+----------+
2 rows in set (0.40 sec)
Run Code Online (Sandbox Code Playgroud)
如果你想选择一个折腾(随机)折腾尾巴,那么你需要搜索你的桌子,挑选一个随机的起始位置.
select * from toss where result != 1 limit 123456, 1;
+--------+--------+
| id | result |
+--------+--------+
| 246700 | 0 |
+--------+--------+
1 row in set (0.06 sec)
explain select * from toss where result != 1 limit 123456, 1;
+----+-------------+-------+------+---------------+------+---------+------+--------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+------+---------+------+--------+-------------+
| 1 | SIMPLE | toss | ALL | NULL | NULL | NULL | NULL | 500000 | Using where |
+----+-------------+-------+------+---------------+------+---------+------+--------+-------------+
Run Code Online (Sandbox Code Playgroud)
您看到您基本上按顺序搜索所有行以查找匹配项.
如果在toss字段上创建索引,则索引将包含两个值,每个值大约有250,000个条目.
create index foo on toss ( result );
Query OK, 500000 rows affected (2.48 sec)
Records: 500000 Duplicates: 0 Warnings: 0
select * from toss where result != 1 limit 123456, 1;
+--------+--------+
| id | result |
+--------+--------+
| 246700 | 0 |
+--------+--------+
1 row in set (0.25 sec)
explain select * from toss where result != 1 limit 123456, 1;
+----+-------------+-------+-------+---------------+------+---------+------+--------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+------+---------+------+--------+-------------+
| 1 | SIMPLE | toss | range | foo | foo | 4 | NULL | 154565 | Using where |
+----+-------------+-------+-------+---------------+------+---------+------+--------+-------------+
Run Code Online (Sandbox Code Playgroud)
现在您搜索的记录较少,但搜索时间从0.06增加到0.25秒.为什么?因为顺序扫描索引实际上比顺序扫描表的效率低,对于给定键具有大量行的索引.
我们来看看这个表上的索引:
show index from toss;
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| toss | 0 | PRIMARY | 1 | id | A | 500000 | NULL | NULL | | BTREE | |
| toss | 1 | foo | 1 | result | A | 2 | NULL | NULL | | BTREE | |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
Run Code Online (Sandbox Code Playgroud)
PRIMARY索引是一个很好的索引:有500,000行,有500,000个值.安排在BTREE中,您可以根据ID快速识别单行.
foo索引是一个错误的索引:有500,000行,但只有2个可能的值.对于BTREE来说,这几乎是最糟糕的情况 - 搜索索引的所有开销,仍然需要搜索结果.
| 归档时间: |
|
| 查看次数: |
4632 次 |
| 最近记录: |