当"not in"语句存在时,mysql不使用索引

Ale*_*art 12 mysql

表结构是:

CREATE TABLE `test` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `from` int(10) unsigned NOT NULL,
  `to` int(10) unsigned NOT NULL,
  `message` text NOT NULL,
  `sent` int(10) unsigned NOT NULL DEFAULT '0',
  `read` tinyint(1) unsigned NOT NULL DEFAULT '0',
  `direction` tinyint(1) unsigned NOT NULL DEFAULT '0',
  PRIMARY KEY (`id`),
  KEY `one` (`to`,`direction`,`from`,`id`),
  KEY `two` (`from`,`direction`,`to`,`id`),
  KEY `three` (`read`,`direction`,`to`),
  KEY `four` (`read`,`direction`,`from`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8;
Run Code Online (Sandbox Code Playgroud)

我有一个奇怪的问题.请查看以下查询:

select test.id, test.from, test.to, test.message, test.sent, test.read, test.direction from test 
where (

    (test.to = 244975 and test.direction <> 2 and test.direction <> 3 and 
        (
        (test.from = 204177 and test.id > 5341203) OR 
        (test.from = 214518 and test.id > 5336549) OR
        (test.from = 231429 and test.id > 5338284) OR
        (test.from = 242739 and test.id > 5339541) OR
        (test.from = 243834 and test.id > 5340438) OR
        (test.from = 244354 and test.id > 5337489) OR
        (test.from = 244644 and test.id > 5338572) OR
        (test.from = 244690 and test.id > 5338467) 
        )

    )

    or 

    (test.from = 244975 and test.direction <> 1 and test.direction <> 3 and 
        (
        (test.to = 204177 and test.id > 5341203) OR
        (test.to = 214518 and test.id > 5336549) OR
        (test.to = 231429 and test.id > 5338284) OR
        (test.to = 242739 and test.id > 5339541) OR
        (test.to = 243834 and test.id > 5340438) OR
        (test.to = 244354 and test.id > 5337489) OR
        (test.to = 244644 and test.id > 5338572) OR
        (test.to = 244690 and test.id > 5338467)
        )
    )

    or 

    (test.read <> 1 and test.direction <> 3 and test.direction <> 2 and test.to = 244975  and test.from not in (204177, 214518, 231429, 242739, 243834, 244354, 244644, 244690)

    )

    or

    (test.read <> 1 and test.direction = 2 and test.from = 244975 and test.to not in (204177, 214518, 231429, 242739, 243834, 244354, 244644, 244690)

    )


     )



     order by test.id;
Run Code Online (Sandbox Code Playgroud)

如果我对此查询进行解释,则会遍历所有行:

1   SIMPLE  test    index   PRIMARY,one,two,three,four  PRIMARY 4       1440596 Using where
Run Code Online (Sandbox Code Playgroud)

如果我删除"not in"语句,那么它工作正常:

select test.id, test.from, test.to, test.message, test.sent, test.read, test.direction from test 
where (

    (test.to = 244975 and test.direction <> 2 and test.direction <> 3 and 
        (
        (test.from = 204177 and test.id > 5341203) OR 
        (test.from = 214518 and test.id > 5336549) OR
        (test.from = 231429 and test.id > 5338284) OR
        (test.from = 242739 and test.id > 5339541) OR
        (test.from = 243834 and test.id > 5340438) OR
        (test.from = 244354 and test.id > 5337489) OR
        (test.from = 244644 and test.id > 5338572) OR
        (test.from = 244690 and test.id > 5338467) 
        )

    )

    or 

    (test.from = 244975 and test.direction <> 1 and test.direction <> 3 and 
        (
        (test.to = 204177 and test.id > 5341203) OR
        (test.to = 214518 and test.id > 5336549) OR
        (test.to = 231429 and test.id > 5338284) OR
        (test.to = 242739 and test.id > 5339541) OR
        (test.to = 243834 and test.id > 5340438) OR
        (test.to = 244354 and test.id > 5337489) OR
        (test.to = 244644 and test.id > 5338572) OR
        (test.to = 244690 and test.id > 5338467)
        )
    )

    or 

    (test.read <> 1 and test.direction <> 3 and test.direction <> 2 and test.to = 244975 

    )

    or

    (test.read <> 1 and test.direction = 2 and test.from = 244975 

    )


     )



     order by test.id;
Run Code Online (Sandbox Code Playgroud)

现在解释查询返回:

1   SIMPLE  test    index_merge PRIMARY,one,two,three,four  one,two 5,5     30  Using sort_union(one,two); Using where; Using filesort
Run Code Online (Sandbox Code Playgroud)

我不确定为什么它不能正常工作.我在索引中缺少什么?

Har*_*rry 5

我不确定为什么它不能正常工作.我在索引中缺少什么?

我非常确定查询规划器工作正常,你不会错过在这种情况下有用的索引中的任何内容.查询规划器决定使用不同的索引会更快,因为这两个查询非常不同.

我们可以让优化器为我们使用索引的并集,这将使它更快.您可以保留not in并且不更改任何or语句.我运行了一些针对union方法使用的方法的基本基准.注意事项适用,因为您的数据库配置可能与我的有很大不同.运行查询1000次并执行3次我为每个查询花了最好的时间...

优化查询如下所示

real    0m15.410s
user    0m6.681s
sys 0m2.641s
Run Code Online (Sandbox Code Playgroud)

重写为一组工会

real    0m17.747s
user    0m6.798s
sys 0m2.812s
Run Code Online (Sandbox Code Playgroud)

像优化器一样思考并使用更少的数据

在对400万行数据库的测试中,以下SQL快几个数量级.关键变化是以下几行

(select * from test where test.from_ in (244975, 204177, 214518, 231429, 242739, 243834, 244354, 244644, 244690) or test.to_ in (244975, 204177, 214518, 231429, 242739, 243834, 244354, 244644, 244690)) as test 
Run Code Online (Sandbox Code Playgroud)

这一行大大减少了mysql需要处理的数据集,因为我们正在使用in而不是not in.这是新查询,我试过不要过多地更改原始查询.

select SQL_NO_CACHE test.id, test.from_, test.to_, test.message, test.sent, test.read_, test.direction 
from (select * from test where test.from_ in (244975, 204177, 214518, 231429, 242739, 243834, 244354, 244644, 244690) or test.to_ in (244975, 204177, 214518, 231429, 242739, 243834, 244354, 244644, 244690)) as test 
where (
  (test.to_ = 244975 and test.direction <> 2 and test.direction <> 3 and test.from_ in (204177, 214518, 231429, 242739, 243834, 244354, 244644, 244690) and 
        (   
        (test.from_ = 204177 and test.id > 5341203) OR  
        (test.from_ = 214518 and test.id > 5336549) OR
        (test.from_ = 231429 and test.id > 5338284) OR
        (test.from_ = 242739 and test.id > 5339541) OR
        (test.from_ = 243834 and test.id > 5340438) OR
        (test.from_ = 244354 and test.id > 5337489) OR
        (test.from_ = 244644 and test.id > 5338572) OR
        (test.from_ = 244690 and test.id > 5338467) 
        )   
    )   
    or  
    (test.from_ = 244975 and test.direction <> 1 and test.direction <> 3 and test.to_ in (204177, 214518, 231429, 242739, 243834, 244354, 244644, 244690) and 
        (   
        (test.to_ = 204177 and test.id > 5341203) OR
        (test.to_ = 214518 and test.id > 5336549) OR
        (test.to_ = 231429 and test.id > 5338284) OR
        (test.to_ = 242739 and test.id > 5339541) OR
        (test.to_ = 243834 and test.id > 5340438) OR
        (test.to_ = 244354 and test.id > 5337489) OR
        (test.to_ = 244644 and test.id > 5338572) OR
        (test.to_ = 244690 and test.id > 5338467)
        ))  
    or  
    (test.read_ <> 1 and test.direction <> 2 and test.direction <> 3 and test.to_ = 244975  and test.from_ not in (204177, 214518, 231429, 242739, 243834, 244354, 244644, 244690))
    or  
    (test.read_ <> 1 and test.direction = 2 and test.from_ = 244975 and test.to_ not in (204177, 214518, 231429, 242739, 243834, 244354, 244644, 244690))
     )   
     order by test.id;
Run Code Online (Sandbox Code Playgroud)

对此的解释计划看起来非常不同......

mysql> \. sql_fixed.sql
*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        table: <derived2>
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 226
     filtered: 100.00
        Extra: Using where; Using filesort
*************************** 2. row ***************************
           id: 2
  select_type: DERIVED
        table: test
         type: index_merge
possible_keys: one,two
          key: two,one
      key_len: 4,4
          ref: NULL
         rows: 226
     filtered: 100.00
        Extra: Using sort_union(two,one); Using where
2 rows in set, 1 warning (0.01 sec)
Run Code Online (Sandbox Code Playgroud)

聪明的优化器可以立即看到它不需要大部分数据,因为我们已经告诉它使用IN带有几个键的语句.大多数查询优化器都会将高成本附加到磁盘访问,因此优化器通常会优先考虑减少此操作的任何内容.

不是对抗

not in而且in非常不同.在这种情况下,这些之间的区别是访问模式,我是暂时还是作为结果集的一部分需要数据.当您使用not in几个键并且索引包含数百万个键时,如果数据是结果集的一部分,则可能需要读取大量记录.即使使用索引not in也可以从磁盘中读取数百万条记录...只in需要几个键,这些是您需要查找和使用小子集的键.这两种访问模式非常不同.以下示例可能有助于明确这一点......

1. I don't want these 10 items from a 1,000,000 records I need the other 999,990, this reads the whole index.
2. I only want these 10 from a 1,000,000 records. This might only require one disk seek.
Run Code Online (Sandbox Code Playgroud)

数字2更快,因为访问模式,即我发现我需要10,Nunmber 1.可能需要读取一百万条记录.

MySQL的查询优化器正在看到这一点,即最后两个OR语句要求来自表或索引的大数据子集,即上面的情况1. 看到这一点以及无论如何它需要使用主键这一事实,优化器决定使用主键更快.

删除not in更改后的内容即现在查询规划器可以使用索引,因为在其他两个or子句中它们生效,get me the few from the many并且它在共享a tofrom列的两个键上执行index_merge id.

要查看我的意思,请不要删除查询中的"不在"部分,将其更改in为查看会发生什么,在我的计算机上查询计划更改为使用范围索引.