kav*_*kav 6 mysql sql select innodb sql-optimization
我有以下数据库(简化):
CREATE TABLE `tracking` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`manufacture` varchar(100) NOT NULL,
`date_last_activity` datetime NOT NULL,
`date_created` datetime NOT NULL,
`date_updated` datetime NOT NULL,
PRIMARY KEY (`id`),
KEY `manufacture` (`manufacture`),
KEY `manufacture_date_last_activity` (`manufacture`, `date_last_activity`),
KEY `date_last_activity` (`date_last_activity`),
) ENGINE=InnoDB AUTO_INCREMENT=401353 DEFAULT CHARSET=utf8
CREATE TABLE `tracking_items` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`tracking_id` int(11) NOT NULL,
`tracking_object_id` varchar(100) NOT NULL,
`tracking_type` int(11) NOT NULL COMMENT 'Its used to specify the type of each item, e.g. car, bike, etc',
`date_created` datetime NOT NULL,
`date_updated` datetime NOT NULL,
PRIMARY KEY (`id`),
KEY `tracking_id` (`tracking_id`),
KEY `tracking_object_id` (`tracking_object_id`),
KEY `tracking_id_tracking_object_id` (`tracking_id`,`tracking_object_id`)
) ENGINE=InnoDB AUTO_INCREMENT=1299995 DEFAULT CHARSET=utf8
CREATE TABLE `cars` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`car_id` varchar(255) NOT NULL COMMENT 'It must be VARCHAR, because the data is coming from external source.',
`manufacture` varchar(255) NOT NULL,
`car_text` text CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci NOT NULL,
`date_order` datetime NOT NULL,
`date_created` datetime NOT NULL,
`date_updated` datetime NOT NULL,
`deleted` tinyint(4) NOT NULL DEFAULT '0',
PRIMARY KEY (`id`),
UNIQUE KEY `car_id` (`car_id`),
KEY `sort_field` (`date_order`)
) ENGINE=InnoDB AUTO_INCREMENT=150000025 DEFAULT CHARSET=utf8
Run Code Online (Sandbox Code Playgroud)
这是我的"有问题"查询,运行速度非常慢.
SELECT sql_no_cache `t`.*,
count(`t`.`id`) AS `cnt_filtered_items`
FROM `tracking` AS `t`
INNER JOIN `tracking_items` AS `ti` ON (`ti`.`tracking_id` = `t`.`id`)
LEFT JOIN `cars` AS `c` ON (`c`.`car_id` = `ti`.`tracking_object_id`
AND `ti`.`tracking_type` = 1)
LEFT JOIN `bikes` AS `b` ON (`b`.`bike_id` = `ti`.`tracking_object_id`
AND `ti`.`tracking_type` = 2)
LEFT JOIN `trucks` AS `tr` ON (`tr`.`truck_id` = `ti`.`tracking_object_id`
AND `ti`.`tracking_type` = 3)
WHERE (`t`.`manufacture` IN('1256703406078',
'9600048390403',
'1533405067830'))
AND (`c`.`car_text` LIKE '%europe%'
OR `b`.`bike_text` LIKE '%europe%'
OR `tr`.`truck_text` LIKE '%europe%')
GROUP BY `t`.`id`
ORDER BY `t`.`date_last_activity` ASC,
`t`.`id` ASC
LIMIT 15
Run Code Online (Sandbox Code Playgroud)
这是EXPLAIN
上述查询的结果:
+----+-------------+-------+--------+-----------------------------------------------------------------------+-------------+---------+-----------------------------+---------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | extra |
+----+-------------+-------+--------+-----------------------------------------------------------------------+-------------+---------+-----------------------------+---------+----------------------------------------------+
| 1 | SIMPLE | t | index | PRIMARY,manufacture,manufacture_date_last_activity,date_last_activity | PRIMARY | 4 | NULL | 400,000 | Using where; Using temporary; Using filesort |
| 1 | SIMPLE | ti | ref | tracking_id,tracking_object_id,tracking_id_tracking_object_id | tracking_id | 4 | table.t.id | 1 | NULL |
| 1 | SIMPLE | c | eq_ref | car_id | car_id | 767 | table.ti.tracking_object_id | 1 | Using where |
| 1 | SIMPLE | b | eq_ref | bike_id | bike_id | 767 | table.ti.tracking_object_id | 1 | Using where |
| 1 | SIMPLE | t | eq_ref | truck_id | truck_id | 767 | table.ti.tracking_object_id | 1 | Using where |
+----+-------------+-------+--------+-----------------------------------------------------------------------+-------------+---------+-----------------------------+---------+----------------------------------------------+
Run Code Online (Sandbox Code Playgroud)
这个查询试图解决的问题是什么?
基本上,我需要找到tracking
表中可能与tracking_items
(1:n)中的记录tracking_items
相关联的所有记录,其中每个记录可能与左连接表中的记录相关联.过滤标准是查询中的关键部分.
我上面的查询有什么问题?
当存在order by
和group by
子句时,查询运行速度非常慢,例如10-15秒即可完成上述配置.但是,如果我省略这些子句中的任何一个,查询运行得非常快(~0.2秒).
我已经尝试过了什么?
FULLTEXT
索引,但它没有多大帮助,因为LIKE
statemenet 评估的结果被JOINs
使用索引缩小了.WHERE EXISTS (...)
查找left
连接表中是否有记录,但遗憾的是没有运气.关于这些表之间关系的几点注释:
tracking -> tracking_items (1:n)
tracking_items -> cars (1:1)
tracking_items -> bikes (1:1)
tracking_items -> trucks (1:1)
Run Code Online (Sandbox Code Playgroud)
所以,我正在寻找一种优化该查询的方法.
Bill Karwin建议如果查询使用带有前导列的索引,则查询可能会表现得更好manufacture
.我是第二个建议.特别是如果那是非常有选择性的.
我还注意到我们正在做一个GROUP BY t.id
,id
表格的PRIMARY KEY 在哪里.
列表tracking
中未引用任何表中的SELECT
列.
这表明我们真的只对返回行感兴趣t
,而不是由于多个外连接而创建重复行.
好像COUNT()
总有返回充气计数的潜力,如果有多个匹配的行tracking_item
和bikes
,cars
,trucks
.如果来自汽车的三个匹配行和来自自行车的四个匹配行,则... COUNT()聚合将返回值12而不是7.(或者可能在数据中有一些保证以便赢得永远不会有多个匹配的行.)
如果manufacture
是非常有选择性的,并且返回一个相当小的行集tracking
,如果查询可以使用索引...
而且tracking
,除了计数或相关项目之外,我们不会从任何表中返回任何列...
我很想测试SELECT列表中的相关子查询,获取计数,并使用HAVING子句过滤掉零计数行.
像这样的东西:
SELECT SQL_NO_CACHE `t`.*
, ( ( SELECT COUNT(1)
FROM `tracking_items` `tic`
JOIN `cars` `c`
ON `c`.`car_id` = `tic`.`tracking_object_id`
AND `c`.`car_text` LIKE '%europe%'
WHERE `tic`.`tracking_id` = `t`.`id`
AND `tic`.`tracking_type` = 1
)
+ ( SELECT COUNT(1)
FROM `tracking_items` `tib`
JOIN `bikes` `b`
ON `b`.`bike_id` = `tib`.`tracking_object_id`
AND `b`.`bike_text` LIKE '%europe%'
WHERE `tib`.`tracking_id` = `t`.`id`
AND `tib`.`tracking_type` = 2
)
+ ( SELECT COUNT(1)
FROM `tracking_items` `tit`
JOIN `trucks` `tr`
ON `tr`.`truck_id` = `tit`.`tracking_object_id`
AND `tr`.`truck_text` LIKE '%europe%'
WHERE `tit`.`tracking_id` = `t`.`id`
AND `tit`.`tracking_type` = 3
)
) AS cnt_filtered_items
FROM `tracking` `t`
WHERE `t`.`manufacture` IN ('1256703406078', '9600048390403', '1533405067830')
HAVING cnt_filtered_items > 0
ORDER
BY `t`.`date_last_activity` ASC
, `t`.`id` ASC
Run Code Online (Sandbox Code Playgroud)
我们期望查询可以有效地使用tracking
带有前导列的索引manufacture
.
并在tracking_items
表中,我们希望与领先列的索引type
和tracking_id
.并且包括tracking_object_id
在该索引中意味着可以从索引满足查询,而无需访问底层页面.
对于cars
,bikes
和trucks
表查询应该使用索引与领先的列car_id
,bike_id
和truck_id
分别.还有周围的扫描没有得到car_text
,bike_text
,truck_text
为匹配字符串列......我们能做的最好的就是缩小范围需要有检查执行的行数.
这种方法(只是tracking
外部查询中的表)应该不需要GROUP BY
识别和折叠重复行所需的工作.
但 这种做法,取代以相关子查询连接,最适合查询,那里有一个小外部查询返回的行数.对外部查询处理的每一行执行这些子查询.这些子查询必须具有合适的索引.即使有这些调整,大型集仍然有可能出现糟糕的表现.
这仍然为我们留下了"使用filesort"操作ORDER BY
.
如果相关项的计数应该是乘法而不是加法的乘积,我们可以调整查询来实现这一点.(我们必须清除零的返回,并且需要更改HAVING子句中的条件.)
如果没有要求返回相关项的COUNT(),那么我很想将SELECT列表中的相关子查询向下移动到子句中的EXISTS
谓词中WHERE
.
附加说明:借调Rick James关于索引的评论......似乎定义了冗余索引.即
KEY `manufacture` (`manufacture`)
KEY `manufacture_date_last_activity` (`manufacture`, `date_last_activity`)
Run Code Online (Sandbox Code Playgroud)
单例列上的索引不是必需的,因为还有另一个索引将列作为前导列.
任何可以有效使用manufacture
索引的查询都能够有效地使用manufacture_date_last_activity
索引.也就是说,manufacture
索引可能会被删除.
这同样适用于tracking_items
表,以及这两个索引:
KEY `tracking_id` (`tracking_id`)
KEY `tracking_id_tracking_object_id` (`tracking_id`,`tracking_object_id`)
Run Code Online (Sandbox Code Playgroud)
该tracking_id
指数可以被丢弃,因为它是多余的.
对于上面的查询,我建议添加覆盖索引:
KEY `tracking_items_IX3` (`tracking_id`,`tracking_type`,`tracking_object_id`)
Run Code Online (Sandbox Code Playgroud)
- 或 - 至少是一个非覆盖索引,这两个列导致:
KEY `tracking_items_IX3` (`tracking_id`,`tracking_type`)
Run Code Online (Sandbox Code Playgroud)