当涉及范围时,索引中的第一个更高的基数列?

Ric*_*mes 5 mysql indexing performance query-optimization mariadb

CREATE TABLE `files` (
  `did` int(10) unsigned NOT NULL DEFAULT '0',
  `filename` varbinary(200) NOT NULL,
  `ext` varbinary(5) DEFAULT NULL,
  `fsize` double DEFAULT NULL,
  `filetime` datetime DEFAULT NULL,
  PRIMARY KEY (`did`,`filename`),
  KEY `fe` (`filetime`,`ext`),          -- This?
  KEY `ef` (`ext`,`filetime`)           -- or This?
) ENGINE=InnoDB DEFAULT CHARSET=utf8 ;
Run Code Online (Sandbox Code Playgroud)

表中有一百万行.文件时间大多不同.数量有限ext.因此,filetime具有高基数并且ext具有低得多的基数.

该查询涉及extfiletime:

WHERE ext = '...'
  AND filetime BETWEEN ... AND ...
Run Code Online (Sandbox Code Playgroud)

这两个指标中的哪一个更好?为什么?

Ric*_*mes 7

首先,让我们尝试FORCE INDEX选择ef或者fe.时间太短,无法清楚地了解哪一个更快,但`EXPLAIN显示出差异:

首先强制范围filetime.(注意:订单WHERE没有影响.)

mysql> EXPLAIN SELECT COUNT(*), AVG(fsize)
    FROM files FORCE INDEX(fe)
    WHERE ext = 'gif' AND filetime >= '2015-01-01'
                      AND filetime <  '2015-01-01' + INTERVAL 1 MONTH;
+----+-------------+-------+-------+---------------+------+---------+------+-------+-----------------------+
| id | select_type | table | type  | possible_keys | key  | key_len | ref  | rows  | Extra                 |
+----+-------------+-------+-------+---------------+------+---------+------+-------+-----------------------+
|  1 | SIMPLE      | files | range | fe            | fe   | 14      | NULL | 16684 | Using index condition |
+----+-------------+-------+-------+---------------+------+---------+------+-------+-----------------------+
Run Code Online (Sandbox Code Playgroud)

首先强制低基数ext:

mysql> EXPLAIN SELECT COUNT(*), AVG(fsize)
    FROM files FORCE INDEX(ef)
    WHERE ext = 'gif' AND filetime >= '2015-01-01'
                      AND filetime <  '2015-01-01' + INTERVAL 1 MONTH;
+----+-------------+-------+-------+---------------+------+---------+------+------+-----------------------+
| id | select_type | table | type  | possible_keys | key  | key_len | ref  | rows | Extra                 |
+----+-------------+-------+-------+---------------+------+---------+------+------+-----------------------+
|  1 | SIMPLE      | files | range | ef            | ef   | 14      | NULL |  538 | Using index condition |
+----+-------------+-------+-------+---------------+------+---------+------+------+-----------------------+
Run Code Online (Sandbox Code Playgroud)

显然,rows说法ef更好.但是,让我们检查优化器跟踪.产量相当笨重; 我只展示有趣的部分.不需要FORCE; 跟踪将显示两个选项然后选择更好.

             ...
             "potential_range_indices": [
                ...
                {
                  "index": "fe",
                  "usable": true,
                  "key_parts": [
                    "filetime",
                    "ext",
                    "did",
                    "filename"
                  ]
                },
                {
                  "index": "ef",
                  "usable": true,
                  "key_parts": [
                    "ext",
                    "filetime",
                    "did",
                    "filename"
                  ]
                }
              ],
Run Code Online (Sandbox Code Playgroud)

...

              "analyzing_range_alternatives": {
                "range_scan_alternatives": [
                  {
                    "index": "fe",
                    "ranges": [
                      "2015-01-01 00:00:00 <= filetime < 2015-02-01 00:00:00"
                    ],
                    "index_dives_for_eq_ranges": true,
                    "rowid_ordered": false,
                    "using_mrr": false,
                    "index_only": false,
                    "rows": 16684,
                    "cost": 20022,               <-- Here's the critical number
                    "chosen": true
                  },
                  {
                    "index": "ef",
                    "ranges": [
                      "gif <= ext <= gif AND 2015-01-01 00:00:00 <= filetime < 2015-02-01 00:00:00"
                    ],
                    "index_dives_for_eq_ranges": true,
                    "rowid_ordered": false,
                    "using_mrr": false,
                    "index_only": false,
                    "rows": 538,
                    "cost": 646.61,               <-- Here's the critical number
                    "chosen": true
                  }
                ],
Run Code Online (Sandbox Code Playgroud)

...

          "attached_conditions_computation": [
            {
              "access_type_changed": {
                "table": "`files`",
                "index": "ef",
                "old_type": "ref",
                "new_type": "range",
                "cause": "uses_more_keyparts"   <-- Also interesting
              }
            }
Run Code Online (Sandbox Code Playgroud)

使用fe(范围列第一),可以使用范围,但它估计扫描16684行捕鱼ext='gif'.

使用ef(低基数ext第一),它可以使用索引的两列并在BTree中更有效地向下钻取.然后它发现了大约538行,所有这些行对查询都很有用 - 不需要进一步过滤.

结论:

  • INDEX(filetime, ext) 仅使用第一列.
  • INDEX(ext, filetime) 使用了两列.
  • 无论基数如何,首先将参与=测试的列放在索引.
  • 查询计划不会超出第一个"范围"列.
  • "基数"与复合索引和此类查询无关.

("使用索引条件"表示存储引擎(InnoDB)将使用超出用于过滤的索引的列.")

  • 文档中的相关部分:“只要比较运算符是`=`、`&lt;=&gt;` 或`IS NULL`,优化器就会尝试使用其他关键部分来确定间隔。如果运算符是`&gt;` 、`&lt;`、`&gt;=`、`&lt;=`、`!=`、`&lt;&gt;`、`BETWEEN` 或`LIKE`,优化器使用它但不再考虑关键部分。” ([范围优化](https://dev.mysql.com/doc/refman/8.0/en/range-optimization.html#range-access-multi-part)) (2认同)