MySQL寻找一个不错的索引

fgu*_*len 5 mysql sql indexing select

我有这张桌子(简化版)

create table completions (
  id int(11) not null auto_increment,
  completed_at datetime default null,
  is_mongo_synced tinyint(1) default '0',
  primary key (id),
  key index_completions_on_completed_at_and_is_mongo_synced_and_id (completed_at,is_mongo_synced,id),
) engine=innodb auto_increment=4785424 default charset=utf8 collate=utf8_unicode_ci;
Run Code Online (Sandbox Code Playgroud)

尺寸:

select count(*) from completions; -- => 4817574
Run Code Online (Sandbox Code Playgroud)

现在我尝试执行此查询:

select completions.* 
from completions  
where 
  (completed_at is not null) 
  and completions.is_mongo_synced = 0 
  order by completions.id asc limit 10;
Run Code Online (Sandbox Code Playgroud)

它需要9 分钟.

我看到没有使用任何索引,explain extend返回此信息:

id: 1 
select_type: SIMPLE
table: completions 
type: index 
possible_keys: index_completions_on_completed_at_and_is_mongo_synced_and_id  
key: PRIMARY 
key_len: 4 
ref: NULL  
rows: 20  
filtered: 11616415.00 
Extra: Using where
Run Code Online (Sandbox Code Playgroud)

如果我强制索引:

select completions.* 
from completions  
force index(index_completions_on_completed_at_and_is_mongo_synced_and_id)
where 
  (completed_at is not null) 
  and completions.is_mongo_synced = 0 
  order by completions.id asc limit 10;
Run Code Online (Sandbox Code Playgroud)

需要1,22s,这要好得多.的explain extend回报:

id: 1
select_type: SIMPLE
table: completions
type: range
possible_keys: index_completions_on_completed_at_and_is_mongo_synced_and_id
key: index_completions_on_completed_at_and_is_mongo_synced_and_id
key_len: 6
ref: null
rows: 2323334
filtered: 100
Extra: Using index condition; Using filesort
Run Code Online (Sandbox Code Playgroud)

现在如果我通过以下方式缩小查询范围completions.id:

select completions.* 
from completions  
force index(index_completions_on_completed_at_and_is_mongo_synced_and_id)
where 
  (completed_at is not null) 
  and completions.is_mongo_synced = 0 
  and completions.id > 2000000
  order by completions.id asc limit 10;
Run Code Online (Sandbox Code Playgroud)

它需要1,31s,仍然很好.的explain extend回报:

id: 1
select_type: SIMPLE
table: completions
type: range
possible_keys: index_completions_on_completed_at_and_is_mongo_synced_and_id
key: index_completions_on_completed_at_and_is_mongo_synced_and_id
key_len: 6
ref: null
rows: 2323407
filtered: 100
Extra: Using index condition; Using filesort
Run Code Online (Sandbox Code Playgroud)

关键是如果对于最后一个查询我不强制索引:

select completions.* 
from completions  
where 
  (completed_at is not null) 
  and completions.is_mongo_synced = 0 
  and completions.id > 2000000
  order by completions.id asc limit 10;
Run Code Online (Sandbox Code Playgroud)

它需要85ms,检查它是ms而不是s.的explain extend回报:

id: 1
select_type: SIMPLE
table: completions
type: range
possible_keys: PRIMARYindex_completions_on_completed_at_and_is_mongo_synced_and_id
key: PRIMARY
key_len: 4
ref: null
rows: 2323451
filtered: 100
Extra: Using where
Run Code Online (Sandbox Code Playgroud)

这不仅令我感到疯狂,而且还因为过滤器数量的微小变化,最后一个查询的性能受到很大影响:

select completions.* 
from completions  
where 
  (completed_at is not null) 
  and completions.is_mongo_synced = 0 
  and completions.id > 1600000
  order by completions.id asc limit 10;
Run Code Online (Sandbox Code Playgroud)

需要13秒

我不明白的事情:

  1. 为什么在查询B假设使用更精确的索引时,以下查询A比查询B更快:c

查询A:

select completions.* 
from completions  
where 
  (completed_at is not null) 
  and completions.is_mongo_synced = 0 
  and completions.id > 2000000
  order by completions.id asc limit 10;
Run Code Online (Sandbox Code Playgroud)

85ms

查询B:

select completions.* 
from completions  
force index(index_completions_on_completed_at_and_is_mongo_synced_and_id)
where 
  (completed_at is not null) 
  and completions.is_mongo_synced = 0 
  and completions.id > 2000000
  order by completions.id asc limit 10;
Run Code Online (Sandbox Code Playgroud)

1,31s

2.为什么以下查询中的性能差异如此:

查询A:

select completions.* 
from completions  
where 
  (completed_at is not null) 
  and completions.is_mongo_synced = 0 
  and completions.id > 2000000
  order by completions.id asc limit 10;
Run Code Online (Sandbox Code Playgroud)

85ms

查询B:

select completions.* 
from completions  
where 
  (completed_at is not null) 
  and completions.is_mongo_synced = 0 
  and completions.id > 1600000
  order by completions.id asc limit 10;
Run Code Online (Sandbox Code Playgroud)

13S

3.为什么MySQL没有自动使用以下查询的索引:

指数:

key index_completions_on_completed_at_and_is_mongo_synced_and_id (completed_at,is_mongo_synced,id),
Run Code Online (Sandbox Code Playgroud)

查询:

select completions.* 
from completions  
force index(index_completions_on_completed_at_and_is_mongo_synced_and_id)
where 
  (completed_at is not null) 
  and completions.is_mongo_synced = 0 
  and completions.id > 2000000
  order by completions.id asc limit 10;
Run Code Online (Sandbox Code Playgroud)

更新

评论中要求提供更多数据

基于is_mongo_synced值 的行数
 select
     completions.is_mongo_synced,
     count(*)
 from completions
 group by completions.is_mongo_synced;
Run Code Online (Sandbox Code Playgroud)

结果:

[
  {
    "is_mongo_synced":0,
    "count(*)":2731921
  },
  {
    "is_mongo_synced":1,
    "count(*)":2087869
  }
]
Run Code Online (Sandbox Code Playgroud) 没有的查询 order by
select completions.* 
from completions  
where 
  (completed_at is not null) 
  and completions.is_mongo_synced = 0 
  and completions.id > 2000000
  limit 10;
Run Code Online (Sandbox Code Playgroud)

544ms

select completions.* 
from completions  
force index(index_completions_on_completed_at_and_is_mongo_synced_and_id)
where 
  (completed_at is not null) 
  and completions.is_mongo_synced = 0 
  and completions.id > 2000000
  limit 10;
Run Code Online (Sandbox Code Playgroud)

314ms

但是,无论如何,我需要订单,因为我正在逐批扫描表.

Gor*_*off 4

你的问题相当复杂。但是,您的第一个查询:

select completions.* 
from completions  
where completed_at is not null and
      completions.is_mongo_synced = 0 
order by completions.id asc
limit 10;
Run Code Online (Sandbox Code Playgroud)

上的最佳索引(is_mongo_synced, completed_at)。可能还有其他方法来编写查询,但在您强制的索引中,列的顺序不是最佳的。

第二个查询中的性能差异可能是因为数据实际上正在排序。额外的数十万行可能会影响排序时间。对 值的依赖id可能是索引不被使用的原因。如果将索引更改为(is_mongo_synced, id, completed_at),则索引使用的可能性会更大。

MySQL 有很好的关于复合索引的文档。您可能想在这里查看它。

添加建议的过滤器后

添加索引后:

KEY `index_completions_on_is_mongo_synced_and_id_and_completed_at` (`is_mongo_synced`,`id`,`completed_at`) USING BTREE,
Run Code Online (Sandbox Code Playgroud)

并再次执行长查询

select completions.* 
from completions  
where 
  (completed_at is not null) 
  and completions.is_mongo_synced = 0 
  order by completions.id asc limit 10;
Run Code Online (Sandbox Code Playgroud)

需要156ms,这非常好。

检查explain extended我们看到 MySQL 正在使用正确的索引:

id: 1
select_type: SIMPLE
table: completions
type: ref
possible_keys: index_completions_on_completed_at_and_is_mongo_synced_and_id,index_completions_on_is_mongo_synced_and_id_and_completed_at
key: index_completions_on_is_mongo_synced_and_id_and_completed_at
key_len: 2
ref: const
rows: 1626322
filtered: 100
Extra: Using index condition; Using where
Run Code Online (Sandbox Code Playgroud)