如何优化查询的执行计划,多个外连接到大表,分组和顺序子句?

kav*_*kav 6 mysql sql select innodb sql-optimization

我有以下数据库(简化):

CREATE TABLE `tracking` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `manufacture` varchar(100) NOT NULL,
  `date_last_activity` datetime NOT NULL,
  `date_created` datetime NOT NULL,
  `date_updated` datetime NOT NULL,
  PRIMARY KEY (`id`),
  KEY `manufacture` (`manufacture`),
  KEY `manufacture_date_last_activity` (`manufacture`, `date_last_activity`),
  KEY `date_last_activity` (`date_last_activity`),
) ENGINE=InnoDB AUTO_INCREMENT=401353 DEFAULT CHARSET=utf8

CREATE TABLE `tracking_items` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `tracking_id` int(11) NOT NULL,
  `tracking_object_id` varchar(100) NOT NULL,
  `tracking_type` int(11) NOT NULL COMMENT 'Its used to specify the type of each item, e.g. car, bike, etc',
  `date_created` datetime NOT NULL,
  `date_updated` datetime NOT NULL,
  PRIMARY KEY (`id`),
  KEY `tracking_id` (`tracking_id`),
  KEY `tracking_object_id` (`tracking_object_id`),
  KEY `tracking_id_tracking_object_id` (`tracking_id`,`tracking_object_id`)
) ENGINE=InnoDB AUTO_INCREMENT=1299995 DEFAULT CHARSET=utf8

CREATE TABLE `cars` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `car_id` varchar(255) NOT NULL COMMENT 'It must be VARCHAR, because the data is coming from external source.',
  `manufacture` varchar(255) NOT NULL,
  `car_text` text CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci NOT NULL,
  `date_order` datetime NOT NULL,
  `date_created` datetime NOT NULL,
  `date_updated` datetime NOT NULL,
  `deleted` tinyint(4) NOT NULL DEFAULT '0',
  PRIMARY KEY (`id`),
  UNIQUE KEY `car_id` (`car_id`),
  KEY `sort_field` (`date_order`)
) ENGINE=InnoDB AUTO_INCREMENT=150000025 DEFAULT CHARSET=utf8
Run Code Online (Sandbox Code Playgroud)

这是我的"有问题"查询,运行速度非常慢.

SELECT sql_no_cache `t`.*,
       count(`t`.`id`) AS `cnt_filtered_items`
FROM `tracking` AS `t`
INNER JOIN `tracking_items` AS `ti` ON (`ti`.`tracking_id` = `t`.`id`)
LEFT JOIN `cars` AS `c` ON (`c`.`car_id` = `ti`.`tracking_object_id`
                            AND `ti`.`tracking_type` = 1)
LEFT JOIN `bikes` AS `b` ON (`b`.`bike_id` = `ti`.`tracking_object_id`
                            AND `ti`.`tracking_type` = 2)
LEFT JOIN `trucks` AS `tr` ON (`tr`.`truck_id` = `ti`.`tracking_object_id`
                            AND `ti`.`tracking_type` = 3)
WHERE (`t`.`manufacture` IN('1256703406078',
                            '9600048390403',
                            '1533405067830'))
  AND (`c`.`car_text` LIKE '%europe%'
       OR `b`.`bike_text` LIKE '%europe%'
       OR `tr`.`truck_text` LIKE '%europe%')
GROUP BY `t`.`id`
ORDER BY `t`.`date_last_activity` ASC,
         `t`.`id` ASC
LIMIT 15
Run Code Online (Sandbox Code Playgroud)

这是EXPLAIN上述查询的结果:

+----+-------------+-------+--------+-----------------------------------------------------------------------+-------------+---------+-----------------------------+---------+----------------------------------------------+
| id | select_type | table |  type  |                             possible_keys                             |     key     | key_len |             ref             |  rows   |                    extra                     |
+----+-------------+-------+--------+-----------------------------------------------------------------------+-------------+---------+-----------------------------+---------+----------------------------------------------+
|  1 | SIMPLE      | t     | index  | PRIMARY,manufacture,manufacture_date_last_activity,date_last_activity | PRIMARY     |       4 | NULL                        | 400,000 | Using where; Using temporary; Using filesort |
|  1 | SIMPLE      | ti    | ref    | tracking_id,tracking_object_id,tracking_id_tracking_object_id         | tracking_id |       4 | table.t.id                  |       1 | NULL                                         |
|  1 | SIMPLE      | c     | eq_ref | car_id                                                                | car_id      |     767 | table.ti.tracking_object_id |       1 | Using where                                  |
|  1 | SIMPLE      | b     | eq_ref | bike_id                                                               | bike_id     |     767 | table.ti.tracking_object_id |       1 | Using where                                  |
|  1 | SIMPLE      | t     | eq_ref | truck_id                                                              | truck_id    |     767 | table.ti.tracking_object_id |       1 | Using where                                  |
+----+-------------+-------+--------+-----------------------------------------------------------------------+-------------+---------+-----------------------------+---------+----------------------------------------------+
Run Code Online (Sandbox Code Playgroud)

这个查询试图解决的问题是什么?

基本上,我需要找到tracking表中可能与tracking_items(1:n)中的记录tracking_items相关联的所有记录,其中每个记录可能与左连接表中的记录相关联.过滤标准是查询中的关键部分.

我上面的查询有什么问题?

当存在order bygroup by子句时,查询运行速度非常慢,例如10-15秒即可完成上述配置.但是,如果我省略这些子句中的任何一个,查询运行得非常快(~0.2秒).

我已经尝试过了什么?

  1. 我试图使用FULLTEXT索引,但它没有多大帮助,因为LIKEstatemenet 评估的结果被JOINs使用索引缩小了.
  2. 我试图用来WHERE EXISTS (...)查找left连接表中是否有记录,但遗憾的是没有运气.

关于这些表之间关系的几点注释:

tracking -> tracking_items (1:n)
tracking_items -> cars (1:1)
tracking_items -> bikes (1:1)
tracking_items -> trucks (1:1)
Run Code Online (Sandbox Code Playgroud)

所以,我正在寻找一种优化该查询的方法.

spe*_*593 5

Bill Karwin建议如果查询使用带有前导列的索引,则查询可能会表现得更好manufacture.我是第二个建议.特别是如果那是非常有选择性的.

我还注意到我们正在做一个GROUP BY t.id,id表格的PRIMARY KEY 在哪里.

列表tracking中未引用任何表中的SELECT列.

这表明我们真的只对返回行感兴趣t,而不是由于多个外连接而创建重复行.

好像COUNT()总有返回充气计数的潜力,如果有多个匹配的行tracking_itembikes,cars,trucks.如果来自汽车的三个匹配行和来自自行车的四个匹配行,则... COUNT()聚合将返回值12而不是7.(或者可能在数据中有一些保证以便赢得永远不会有多个匹配的行.)

如果manufacture是非常有选择性的,并且返回一个相当小的行集tracking,如果查询可以使用索引...

而且tracking,除了计数或相关项目之外,我们不会从任何表中返回任何列...

我很想测试SELECT列表中的相关子查询,获取计数,并使用HAVING子句过滤掉零计数行.

像这样的东西:

SELECT SQL_NO_CACHE `t`.*
     , ( ( SELECT COUNT(1)
             FROM `tracking_items` `tic`
             JOIN `cars` `c`
               ON `c`.`car_id`           = `tic`.`tracking_object_id`
              AND `c`.`car_text`      LIKE '%europe%'
            WHERE `tic`.`tracking_id`    = `t`.`id`
              AND `tic`.`tracking_type`  = 1
         )
       + ( SELECT COUNT(1)
             FROM `tracking_items` `tib`
             JOIN `bikes` `b`
               ON `b`.`bike_id`          = `tib`.`tracking_object_id` 
              AND `b`.`bike_text`     LIKE '%europe%'
            WHERE `tib`.`tracking_id`    = `t`.`id`
              AND `tib`.`tracking_type`  = 2
         )
       + ( SELECT COUNT(1)
             FROM `tracking_items` `tit`
             JOIN `trucks` `tr`
               ON `tr`.`truck_id`        = `tit`.`tracking_object_id`
              AND `tr`.`truck_text`   LIKE '%europe%'
            WHERE `tit`.`tracking_id`    = `t`.`id`
              AND `tit`.`tracking_type`  = 3
         ) 
       ) AS cnt_filtered_items
  FROM `tracking` `t`
 WHERE `t`.`manufacture` IN ('1256703406078', '9600048390403', '1533405067830')
HAVING cnt_filtered_items > 0
 ORDER
    BY `t`.`date_last_activity` ASC
     , `t`.`id` ASC
Run Code Online (Sandbox Code Playgroud)

我们期望查询可以有效地使用tracking带有前导列的索引manufacture.

并在tracking_items表中,我们希望与领先列的索引typetracking_id.并且包括tracking_object_id在该索引中意味着可以从索引满足查询,而无需访问底层页面.

对于cars,bikestrucks表查询应该使用索引与领先的列car_id,bike_idtruck_id分别.还有周围的扫描没有得到car_text,bike_text,truck_text为匹配字符串列......我们能做的最好的就是缩小范围需要有检查执行的行数.

这种方法(只是tracking外部查询中的表)应该不需要GROUP BY识别和折叠重复行所需的工作.

这种做法,取代以相关子查询连接,最适合查询,那里有一个外部查询返回的行数.对外部查询处理的每一行执行这些子查询.这些子查询必须具有合适的索引.即使有这些调整,大型集仍然有可能出现糟糕的表现.

这仍然为我们留下了"使用filesort"操作ORDER BY.


如果相关项的计数应该是乘法而不是加法的乘积,我们可以调整查询来实现这一点.(我们必须清除零的返回,并且需要更改HAVING子句中的条件.)

如果没有要求返回相关项的COUNT(),那么我很想将SELECT列表中的相关子查询向下移动到子句中的EXISTS谓词中WHERE.


附加说明:借调Rick James关于索引的评论......似乎定义了冗余索引.即

KEY `manufacture` (`manufacture`)
KEY `manufacture_date_last_activity` (`manufacture`, `date_last_activity`)
Run Code Online (Sandbox Code Playgroud)

单例列上的索引不是必需的,因为还有另一个索引将列作为前导列.

任何可以有效使用manufacture索引的查询都能够有效地使用manufacture_date_last_activity索引.也就是说,manufacture索引可能会被删除.

这同样适用于tracking_items表,以及这两个索引:

KEY `tracking_id` (`tracking_id`)
KEY `tracking_id_tracking_object_id` (`tracking_id`,`tracking_object_id`)
Run Code Online (Sandbox Code Playgroud)

tracking_id指数可以被丢弃,因为它是多余的.

对于上面的查询,我建议添加覆盖索引:

KEY `tracking_items_IX3` (`tracking_id`,`tracking_type`,`tracking_object_id`)
Run Code Online (Sandbox Code Playgroud)

- 或 - 至少是一个非覆盖索引,这两个列导致:

KEY `tracking_items_IX3` (`tracking_id`,`tracking_type`)
Run Code Online (Sandbox Code Playgroud)