Hen*_*los 5 mysql innodb index mysql-5.7
我的应用程序中有一些资产会不时以异步方式更新。
我要在这里使用的例子是Vehicles
. 有两个表:
Vehicles
:保存有关车辆本身的信息VehicleUpdates
:保存有关该车辆发生的所有更新的信息。表结构的相关部分是:
CREATE TABLE `Vehicles` (
`id` varchar(50) NOT NULL,
`organizationId` varchar(50) NOT NULL,
`plate` char(7) NOT NULL,
`vehicleInfo` json DEFAULT NULL,
`createdAt` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`updatedAt` timestamp NULL DEFAULT NULL ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY (`id`),
UNIQUE KEY `unq_Vehicles_orgId_plate_idx` (`organizationId`,`plate`) USING BTREE,
KEY `Vehicles_createdAt_idx` (`createdAt`),
);
CREATE TABLE `VehicleUpdates` (
`id` varchar(50) NOT NULL,
`organizationId` varchar(50) NOT NULL,
`vehiclePlate` char(7) NOT NULL,
`status` varchar(15) NOT NULL,
`createdAt` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`updatedAt` timestamp NULL DEFAULT NULL ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY (`id`),
KEY `VehicleUpdates_orgId_vhclPlt_createdAt_idx` (`organizationId`,`vehiclePlate`,`createdAt`) USING BTREE
);
Run Code Online (Sandbox Code Playgroud)
现在我有一个新要求,我必须返回最新的更新信息以及车辆信息本身。
经过一番挖掘后,我发现了这篇博客文章。然后我尝试使用建议的“不相关子查询”方法,因为它被认为是最好的方法:
SELECT vu1.*
FROM VehicleUpdates AS vu1
JOIN
( SELECT vehiclePlate, organizationId, MAX(createdAt) AS createdAt
FROM VehicleUpdates
GROUP BY organizationId, vehiclePlate
) AS vu2 USING (organizationId, vehiclePlate, createdAt);
Run Code Online (Sandbox Code Playgroud)
该查询在我的生产数据库中的平均执行时间为275 ms
。
我认为这太慢了,所以我决定尝试一下“LEFT JOIN”方法:
SELECT vu1.*
FROM VehicleUpdates AS vu1
LEFT JOIN VehicleUpdates AS vu2 ON vu1.organizationId = vu2.organizationId and vu1.vehiclePlate = vu2.vehiclePlate
AND vu2.createdAt > vu1.createdAt
WHERE vu2.id IS NULL;
Run Code Online (Sandbox Code Playgroud)
这个性能更好,平均执行时间为40 ms
. 对我来说足够好了。
然后我需要运行此查询作为表查询的一部分Vehicles
。
以下查询可以满足我的要求:
SELECT v.*, vu1.*
FROM Vehicles AS v
LEFT JOIN VehicleUpdates AS vu1
ON v.plate = vu1.vehiclePlate
AND v.organizationId = vu1.organizationId
LEFT JOIN VehicleUpdates AS vu2
ON vu1.organizationId = vu2.organizationId
AND vu1.vehiclePlate = vu2.vehiclePlate
AND vu2.createdAt > vu1.createdAt
WHERE vu2.id IS NULL;
Run Code Online (Sandbox Code Playgroud)
问题是它需要20 s
(!)才能运行。大问题!
但我从来没有对生产进行全表扫描。该查询始终仅限于单个查询organizationId
并且是分页的,因此我每页最多返回 100 行,因此我运行了以下查询:
SELECT v.*, vu1.*
FROM Vehicles AS v
LEFT JOIN VehicleUpdates AS vu1
ON v.plate = vu1.vehiclePlate
AND v.organizationId = vu1.organizationId
LEFT JOIN VehicleUpdates AS vu2
ON vu1.organizationId = vu2.organizationId
AND vu1.vehiclePlate = vu2.vehiclePlate
AND vu2.createdAt > vu1.createdAt
WHERE vu2.id IS NULL
and v.organizationId = '<some organization ID>'
LIMIT 100;
Run Code Online (Sandbox Code Playgroud)
现在需要从750 ms
到11 s
运行,具体取决于关联的车辆数量。还不够好。
运行explain
上面的查询让我得到:
"select_type" | "table" | "type" | "possible_keys" | "key" | "key_len" | "ref" | "rows" | "filtered" | "Extra"
SIMPLE | v | ref | unq_Vehicles_orgId_plate_idx,Vehicles_orgId_status_idx | unq_Vehicles_orgId_plate_idx | "202" | const | 30 | 100 |
SIMPLE | vu1 | ALL | | | | | 263171 | 100 | Using where; Using join buffer (Block Nested Loop)
SIMPLE | vu2 | ref | VehicleUpdates_orgId_vhclPlt_createdAt_idx | VehicleUpdates_orgId_vhclPlt_createdAt_idx | "173" | vu1.organizationId,vu1.vehiclePlate | 10 | 10 | Using where; Not exists; Using index
Run Code Online (Sandbox Code Playgroud)
令我印象深刻的是,该vu1
表正在运行全表扫描,即使最左边的表Vehicles
正在使用索引列进行过滤organizationId
,该列也在 中进行索引VehicleUpdates
。
所以我决定再次尝试“不相关子查询”并运行:
SELECT v.*, vu.*
FROM Vehicles AS v
LEFT JOIN (
SELECT vu1.*
FROM VehicleUpdates AS vu1
JOIN
( SELECT vehiclePlate, organizationId, MAX(createdAt) AS createdAt
FROM VehicleUpdates
GROUP BY organizationId, vehiclePlate
) AS vu2 USING (organizationId, vehiclePlate, createdAt)
) AS vu
ON vu.organizationId = v.organizationId
AND vu.vehiclePlate = v.plate
WHERE v.organizationId = '<SOME ORGANIZATION ID>'
LIMIT 100;
Run Code Online (Sandbox Code Playgroud)
这次执行时间从1.4 s
到不等,具体取决于给定的表13 s
中有多少条目。我的申请不可接受。Vehicles
organizationId
跑步explain
让我:
| "select_type" | "table" | "type" | "possible_keys" | "key" | "key_len" | "ref" | "rows" | "filtered" | "Extra"
| PRIMARY | v | ALL | | | | | 14456 | 100 |
| PRIMARY | <derived3> | ALL | | | | | 29289 | 100 | Using where
| PRIMARY | vu1 | ref | VehicleUpdates_orgId_vhclPlt_createdAt_idx | VehicleUpdates_orgId_vhclPlt_createdAt_idx | "327" | vu2.organizationId,vu2.vehiclePlate,vu2.createdAt | 1 | 100 | Using where
| DERIVED | VehicleUpdates | range | VehicleUpdates_orgId_vhclPlt_createdAt_idx | VehicleUpdates_orgId_vhclPlt_createdAt_idx | "323" | | 29289 | 100 | Using index for group-by
Run Code Online (Sandbox Code Playgroud)
我注意到添加特定organizationId
子句可以提高性能。
跑步:
SELECT v.*, vu1.*
FROM Vehicles AS v
LEFT JOIN VehicleUpdates AS vu1
ON v.plate = vu1.vehiclePlate
AND v.organizationId = vu1.organizationId
AND vu1.organizationId = '<SOME ORGANIZATION ID>' -- <--------
LEFT JOIN VehicleUpdates AS vu2
ON vu1.organizationId = vu2.organizationId
AND vu1.vehiclePlate = vu2.vehiclePlate
AND vu2.createdAt > vu1.createdAt
WHERE vu2.id IS NULL
and v.organizationId = '<SOME ORGANIZATION ID>' -- <-----------
LIMIT 100;
Run Code Online (Sandbox Code Playgroud)
我得到的执行时间从65 ms
(可接受)到2.5 s
(不可接受)不等。
organizationId = '<SOME ORGANIZATION ID>'
在“主”查询和连接外部子查询中放置一个子句:
SELECT v.*, vu.*
FROM Vehicles AS v
LEFT JOIN (
SELECT vu1.*
FROM VehicleUpdates AS vu1
JOIN
( SELECT vehiclePlate, organizationId, MAX(createdAt) AS createdAt
FROM VehicleUpdates
GROUP BY organizationId, vehiclePlate
) AS vu2 ON vu1.organizationId = vu2.organizationId
and vu1.vehiclePlate = vu2.vehiclePlate
and vu1.createdAt = vu2.createdAt
WHERE organizationId = '<SOME ORGANIZATION ID>' -- <--------
) AS vu
ON vu.organizationId = v.organizationId
AND vu.vehiclePlate = v.plate
where
v.organizationId = '<SOME ORGANIZATION ID>' -- <---------
LIMIT 100;
Run Code Online (Sandbox Code Playgroud)
我得到的执行时间从450 ms
(不可接受)到900 ms
(不可接受)不等。
organizationId = '<SOME ORGANIZATION ID>'
在“主”查询和连接内部子查询中放置一个子句:
SELECT v.*, vu.*
FROM Vehicles AS v
LEFT JOIN (
SELECT vu1.*
FROM VehicleUpdates AS vu1
JOIN
( SELECT vehiclePlate, organizationId, MAX(createdAt) AS createdAt
FROM VehicleUpdates
WHERE organizationId = '<SOME ORGANIZATION ID>' -- <--------
GROUP BY organizationId, vehiclePlate
) AS vu2 ON vu1.organizationId = vu2.organizationId
and vu1.vehiclePlate = vu2.vehiclePlate
and vu1.createdAt = vu2.createdAt
) AS vu
ON vu.organizationId = v.organizationId
AND vu.vehiclePlate = v.plate
where
v.organizationId = '<SOME ORGANIZATION ID>' -- <---------
LIMIT 100;
Run Code Online (Sandbox Code Playgroud)
我得到的执行时间从225 ms
(可接受)到500 ms
(不可接受)不等。
有没有更好的方法来处理此类查询?
我觉得自己好傻!刚刚发现问题了。
由于某种原因,Vehicles
并且VehicleUpdates
有不同的字符集(utf8mb4
和utf8
)。
EXPLAIN
这就是为什么“不相关子查询”方法的结果在其步骤之一中进行全表扫描的原因:
| "select_type" | "table" | "type" | "possible_keys" | "key" | "key_len" | "ref" | "rows" | "filtered" | "Extra"
| PRIMARY | v | ALL | | | | | 14456 | 100 |
| PRIMARY | <derived3> | ALL | | | | | 29289 | 100 | Using where
| PRIMARY | vu1 | ref | VehicleUpdates_orgId_vhclPlt_createdAt_idx | VehicleUpdates_orgId_vhclPlt_createdAt_idx | "327" | vu2.organizationId,vu2.vehiclePlate,vu2.createdAt | 1 | 100 | Using where
| DERIVED | VehicleUpdates | range | VehicleUpdates_orgId_vhclPlt_createdAt_idx | VehicleUpdates_orgId_vhclPlt_createdAt_idx | "323" | | 29289 | 100 | Using index for group-by
Run Code Online (Sandbox Code Playgroud)
转换为 后VehicleUpdates
,utf8mb4
结果EXPLAIN
为:
| "select_type" | "table" | "type" | "possible_keys" | "key" | "key_len" | "ref" | "rows" | "filtered" | "Extra"
| PRIMARY | v | ref | Vehicles_orgId_status_idx | Vehicles_orgId_status_idx | "202" | const | 188 | 100 |
| PRIMARY | <derived2> | ref | <auto_key1> | <auto_key1> | "230" | v.plate,v.organizationId | 10 | 100 |
| PRIMARY | vu1 | ref | VehicleUpdates_orgId_vhclPlt_createdAt_idx | VehicleUpdates_orgId_vhclPlt_createdAt_idx | "234" | v.organizationId,vu2.vehiclePlate,vu2.createdAt | 1 | 100 | Using where
| DERIVED | VehicleUpdates | ref | VehicleUpdates_orgId_vhclPlt_createdAt_idx | VehicleUpdates_orgId_vhclPlt_createdAt_idx | "202" | const | 24090 | 100 | Using where; Using index
Run Code Online (Sandbox Code Playgroud)
同样,“LEFT JOIN”方法执行计划更改为:
| "select_type" | "table" | "type" | "possible_keys" | "key" | "key_len" | "ref" | "rows" | "filtered" | "Extra"
| SIMPLE | v | ref | unq_Vehicles_orgId_plate_idx,Vehicles_orgId_status_idx | unq_Vehicles_orgId_plate_idx | "202" | const | 30 | 100 |
| SIMPLE | vu1 | ALL | | | | | 263171 | 100 | Using where; Using join buffer (Block Nested Loop)
| SIMPLE | vu2 | ref | VehicleUpdates_orgId_vhclPlt_createdAt_idx | VehicleUpdates_orgId_vhclPlt_createdAt_idx | "173" | vu1.organizationId,vu1.vehiclePlate | 10 | 10 | Using where; Not exists; Using index
Run Code Online (Sandbox Code Playgroud)
到:
| "select_type" | "table" | "type" | "possible_keys" | "key" | "key_len" | "ref" | "rows" | "filtered" | "Extra"
| SIMPLE | v | ref | Vehicles_orgId_status_idx | Vehicles_orgId_status_idx | "202" | const | 188 | 100 |
| SIMPLE | vu1 | ref | VehicleUpdates_orgId_vhclPlt_createdAt_idx | VehicleUpdates_orgId_vhclPlt_createdAt_idx | "230" | v.organizationId,v.plate | 9 | 100 |
| SIMPLE | vu2 | ref | VehicleUpdates_orgId_vhclPlt_createdAt_idx | VehicleUpdates_orgId_vhclPlt_createdAt_idx | "230" | vu1.organizationId,vu1.vehiclePlate | 9 | 10 | Using where; Not exists; Using index
Run Code Online (Sandbox Code Playgroud)
因此,现在不同查询的性能是:
| "select_type" | "table" | "type" | "possible_keys" | "key" | "key_len" | "ref" | "rows" | "filtered" | "Extra"
| PRIMARY | v | ALL | | | | | 14456 | 100 |
| PRIMARY | <derived3> | ALL | | | | | 29289 | 100 | Using where
| PRIMARY | vu1 | ref | VehicleUpdates_orgId_vhclPlt_createdAt_idx | VehicleUpdates_orgId_vhclPlt_createdAt_idx | "327" | vu2.organizationId,vu2.vehiclePlate,vu2.createdAt | 1 | 100 | Using where
| DERIVED | VehicleUpdates | range | VehicleUpdates_orgId_vhclPlt_createdAt_idx | VehicleUpdates_orgId_vhclPlt_createdAt_idx | "323" | | 29289 | 100 | Using index for group-by
Run Code Online (Sandbox Code Playgroud)
总是在下面跑50 ms
。
| "select_type" | "table" | "type" | "possible_keys" | "key" | "key_len" | "ref" | "rows" | "filtered" | "Extra"
| PRIMARY | v | ref | Vehicles_orgId_status_idx | Vehicles_orgId_status_idx | "202" | const | 188 | 100 |
| PRIMARY | <derived2> | ref | <auto_key1> | <auto_key1> | "230" | v.plate,v.organizationId | 10 | 100 |
| PRIMARY | vu1 | ref | VehicleUpdates_orgId_vhclPlt_createdAt_idx | VehicleUpdates_orgId_vhclPlt_createdAt_idx | "234" | v.organizationId,vu2.vehiclePlate,vu2.createdAt | 1 | 100 | Using where
| DERIVED | VehicleUpdates | ref | VehicleUpdates_orgId_vhclPlt_createdAt_idx | VehicleUpdates_orgId_vhclPlt_createdAt_idx | "202" | const | 24090 | 100 | Using where; Using index
Run Code Online (Sandbox Code Playgroud)
平均运行时间为300 ms
.
| "select_type" | "table" | "type" | "possible_keys" | "key" | "key_len" | "ref" | "rows" | "filtered" | "Extra"
| SIMPLE | v | ref | unq_Vehicles_orgId_plate_idx,Vehicles_orgId_status_idx | unq_Vehicles_orgId_plate_idx | "202" | const | 30 | 100 |
| SIMPLE | vu1 | ALL | | | | | 263171 | 100 | Using where; Using join buffer (Block Nested Loop)
| SIMPLE | vu2 | ref | VehicleUpdates_orgId_vhclPlt_createdAt_idx | VehicleUpdates_orgId_vhclPlt_createdAt_idx | "173" | vu1.organizationId,vu1.vehiclePlate | 10 | 10 | Using where; Not exists; Using index
Run Code Online (Sandbox Code Playgroud)
也总是在下面奔跑50 ms
。
我决定坚持使用“LEFT JOIN”方法,因为它允许我创建一个视图来表示内部查询,这样我就可以简化返回车辆的查询。
我无法使用“不相关子查询”来执行此操作,因为它需要WHERE organizationId = '<ORGANIZATION ID>'
内部查询中的子句,因此视图不会那么高效。
归档时间: |
|
查看次数: |
230 次 |
最近记录: |