表结构是:
CREATE TABLE `test` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`from` int(10) unsigned NOT NULL,
`to` int(10) unsigned NOT NULL,
`message` text NOT NULL,
`sent` int(10) unsigned NOT NULL DEFAULT '0',
`read` tinyint(1) unsigned NOT NULL DEFAULT '0',
`direction` tinyint(1) unsigned NOT NULL DEFAULT '0',
PRIMARY KEY (`id`),
KEY `one` (`to`,`direction`,`from`,`id`),
KEY `two` (`from`,`direction`,`to`,`id`),
KEY `three` (`read`,`direction`,`to`),
KEY `four` (`read`,`direction`,`from`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8;
Run Code Online (Sandbox Code Playgroud)
我有一个奇怪的问题.请查看以下查询:
select test.id, test.from, test.to, test.message, test.sent, test.read, test.direction from test
where (
(test.to = 244975 and test.direction <> 2 and test.direction <> 3 and
(
(test.from = 204177 and test.id > 5341203) OR
(test.from = 214518 and test.id > 5336549) OR
(test.from = 231429 and test.id > 5338284) OR
(test.from = 242739 and test.id > 5339541) OR
(test.from = 243834 and test.id > 5340438) OR
(test.from = 244354 and test.id > 5337489) OR
(test.from = 244644 and test.id > 5338572) OR
(test.from = 244690 and test.id > 5338467)
)
)
or
(test.from = 244975 and test.direction <> 1 and test.direction <> 3 and
(
(test.to = 204177 and test.id > 5341203) OR
(test.to = 214518 and test.id > 5336549) OR
(test.to = 231429 and test.id > 5338284) OR
(test.to = 242739 and test.id > 5339541) OR
(test.to = 243834 and test.id > 5340438) OR
(test.to = 244354 and test.id > 5337489) OR
(test.to = 244644 and test.id > 5338572) OR
(test.to = 244690 and test.id > 5338467)
)
)
or
(test.read <> 1 and test.direction <> 3 and test.direction <> 2 and test.to = 244975 and test.from not in (204177, 214518, 231429, 242739, 243834, 244354, 244644, 244690)
)
or
(test.read <> 1 and test.direction = 2 and test.from = 244975 and test.to not in (204177, 214518, 231429, 242739, 243834, 244354, 244644, 244690)
)
)
order by test.id;
Run Code Online (Sandbox Code Playgroud)
如果我对此查询进行解释,则会遍历所有行:
1 SIMPLE test index PRIMARY,one,two,three,four PRIMARY 4 1440596 Using where
Run Code Online (Sandbox Code Playgroud)
如果我删除"not in"语句,那么它工作正常:
select test.id, test.from, test.to, test.message, test.sent, test.read, test.direction from test
where (
(test.to = 244975 and test.direction <> 2 and test.direction <> 3 and
(
(test.from = 204177 and test.id > 5341203) OR
(test.from = 214518 and test.id > 5336549) OR
(test.from = 231429 and test.id > 5338284) OR
(test.from = 242739 and test.id > 5339541) OR
(test.from = 243834 and test.id > 5340438) OR
(test.from = 244354 and test.id > 5337489) OR
(test.from = 244644 and test.id > 5338572) OR
(test.from = 244690 and test.id > 5338467)
)
)
or
(test.from = 244975 and test.direction <> 1 and test.direction <> 3 and
(
(test.to = 204177 and test.id > 5341203) OR
(test.to = 214518 and test.id > 5336549) OR
(test.to = 231429 and test.id > 5338284) OR
(test.to = 242739 and test.id > 5339541) OR
(test.to = 243834 and test.id > 5340438) OR
(test.to = 244354 and test.id > 5337489) OR
(test.to = 244644 and test.id > 5338572) OR
(test.to = 244690 and test.id > 5338467)
)
)
or
(test.read <> 1 and test.direction <> 3 and test.direction <> 2 and test.to = 244975
)
or
(test.read <> 1 and test.direction = 2 and test.from = 244975
)
)
order by test.id;
Run Code Online (Sandbox Code Playgroud)
现在解释查询返回:
1 SIMPLE test index_merge PRIMARY,one,two,three,four one,two 5,5 30 Using sort_union(one,two); Using where; Using filesort
Run Code Online (Sandbox Code Playgroud)
我不确定为什么它不能正常工作.我在索引中缺少什么?
我不确定为什么它不能正常工作.我在索引中缺少什么?
我非常确定查询规划器工作正常,你不会错过在这种情况下有用的索引中的任何内容.查询规划器决定使用不同的索引会更快,因为这两个查询非常不同.
我们可以让优化器为我们使用索引的并集,这将使它更快.您可以保留not in并且不更改任何or语句.我运行了一些针对union方法使用的方法的基本基准.注意事项适用,因为您的数据库配置可能与我的有很大不同.运行查询1000次并执行3次我为每个查询花了最好的时间...
优化查询如下所示
real 0m15.410s
user 0m6.681s
sys 0m2.641s
Run Code Online (Sandbox Code Playgroud)
重写为一组工会
real 0m17.747s
user 0m6.798s
sys 0m2.812s
Run Code Online (Sandbox Code Playgroud)
像优化器一样思考并使用更少的数据
在对400万行数据库的测试中,以下SQL快几个数量级.关键变化是以下几行
(select * from test where test.from_ in (244975, 204177, 214518, 231429, 242739, 243834, 244354, 244644, 244690) or test.to_ in (244975, 204177, 214518, 231429, 242739, 243834, 244354, 244644, 244690)) as test
Run Code Online (Sandbox Code Playgroud)
这一行大大减少了mysql需要处理的数据集,因为我们正在使用in而不是not in.这是新查询,我试过不要过多地更改原始查询.
select SQL_NO_CACHE test.id, test.from_, test.to_, test.message, test.sent, test.read_, test.direction
from (select * from test where test.from_ in (244975, 204177, 214518, 231429, 242739, 243834, 244354, 244644, 244690) or test.to_ in (244975, 204177, 214518, 231429, 242739, 243834, 244354, 244644, 244690)) as test
where (
(test.to_ = 244975 and test.direction <> 2 and test.direction <> 3 and test.from_ in (204177, 214518, 231429, 242739, 243834, 244354, 244644, 244690) and
(
(test.from_ = 204177 and test.id > 5341203) OR
(test.from_ = 214518 and test.id > 5336549) OR
(test.from_ = 231429 and test.id > 5338284) OR
(test.from_ = 242739 and test.id > 5339541) OR
(test.from_ = 243834 and test.id > 5340438) OR
(test.from_ = 244354 and test.id > 5337489) OR
(test.from_ = 244644 and test.id > 5338572) OR
(test.from_ = 244690 and test.id > 5338467)
)
)
or
(test.from_ = 244975 and test.direction <> 1 and test.direction <> 3 and test.to_ in (204177, 214518, 231429, 242739, 243834, 244354, 244644, 244690) and
(
(test.to_ = 204177 and test.id > 5341203) OR
(test.to_ = 214518 and test.id > 5336549) OR
(test.to_ = 231429 and test.id > 5338284) OR
(test.to_ = 242739 and test.id > 5339541) OR
(test.to_ = 243834 and test.id > 5340438) OR
(test.to_ = 244354 and test.id > 5337489) OR
(test.to_ = 244644 and test.id > 5338572) OR
(test.to_ = 244690 and test.id > 5338467)
))
or
(test.read_ <> 1 and test.direction <> 2 and test.direction <> 3 and test.to_ = 244975 and test.from_ not in (204177, 214518, 231429, 242739, 243834, 244354, 244644, 244690))
or
(test.read_ <> 1 and test.direction = 2 and test.from_ = 244975 and test.to_ not in (204177, 214518, 231429, 242739, 243834, 244354, 244644, 244690))
)
order by test.id;
Run Code Online (Sandbox Code Playgroud)
对此的解释计划看起来非常不同......
mysql> \. sql_fixed.sql
*************************** 1. row ***************************
id: 1
select_type: PRIMARY
table: <derived2>
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 226
filtered: 100.00
Extra: Using where; Using filesort
*************************** 2. row ***************************
id: 2
select_type: DERIVED
table: test
type: index_merge
possible_keys: one,two
key: two,one
key_len: 4,4
ref: NULL
rows: 226
filtered: 100.00
Extra: Using sort_union(two,one); Using where
2 rows in set, 1 warning (0.01 sec)
Run Code Online (Sandbox Code Playgroud)
聪明的优化器可以立即看到它不需要大部分数据,因为我们已经告诉它使用IN带有几个键的语句.大多数查询优化器都会将高成本附加到磁盘访问,因此优化器通常会优先考虑减少此操作的任何内容.
不是对抗
not in而且in非常不同.在这种情况下,这些之间的区别是访问模式,我是暂时还是作为结果集的一部分需要数据.当您使用not in几个键并且索引包含数百万个键时,如果数据是结果集的一部分,则可能需要读取大量记录.即使使用索引not in也可以从磁盘中读取数百万条记录...只in需要几个键,这些是您需要查找和使用小子集的键.这两种访问模式非常不同.以下示例可能有助于明确这一点......
1. I don't want these 10 items from a 1,000,000 records I need the other 999,990, this reads the whole index.
2. I only want these 10 from a 1,000,000 records. This might only require one disk seek.
Run Code Online (Sandbox Code Playgroud)
数字2更快,因为访问模式,即我发现我需要10,Nunmber 1.可能需要读取一百万条记录.
MySQL的查询优化器正在看到这一点,即最后两个OR语句要求来自表或索引的大数据子集,即上面的情况1. 看到这一点以及无论如何它需要使用主键这一事实,优化器决定使用主键更快.
删除not in更改后的内容即现在查询规划器可以使用索引,因为在其他两个or子句中它们生效,get me the few from the many并且它在共享a to和from列的两个键上执行index_merge id.
要查看我的意思,请不要删除查询中的"不在"部分,将其更改in为查看会发生什么,在我的计算机上查询计划更改为使用范围索引.
| 归档时间: |
|
| 查看次数: |
333 次 |
| 最近记录: |