使用子查询和 IN 子句查询性能

dre*_*010 6 mysql performance subquery

我正在尝试从历史表中为多个设备(唯一序列号)选择一系列数据,并想知道为什么以下查询的时间差异如此之大:

基本上我试图使用 IN 子句来指示我想要获取数据的项目。如果我对 IN 子句中的项目进行“硬编码”,则查询速度很快,如果我使用子查询或连接来选择项目,则性能很差。

此查询在 0.15 秒内完成并返回 7382 行。

SELECT `readings`.* FROM `readings`
WHERE
  (SerialNumber IN ('091146000121', *snip 25*, '091146000556'))
AND (readings.time >= 1325404800)
AND (readings.time < 1326317400)
ORDER BY `time` ASC
Run Code Online (Sandbox Code Playgroud)

使用子查询重写以获取序列号的相同查询需要 30 多秒,并且似乎大部分时间都处于 Preparing 状态。它返回与第一个查询相同的数据。

SELECT `readings`.* FROM `readings`
WHERE
  (SerialNumber IN (SELECT `boards`.`id` AS `SerialNumber` FROM `boards` WHERE (siteId = '1')))
AND (readings.time >= 1325404800)
AND (readings.time < 1326317400)
ORDER BY `time` ASC
Run Code Online (Sandbox Code Playgroud)

子查询返回与第一个查询中相同的值,但如前所述,这需要更长的时间来运行。 它们在功能上不是等效的吗?

这是两个查询的解释:

+----+-------------+----------+-------+---------------+---------+---------+------+------+-----------------------------+
| id | select_type | table    | type  | possible_keys | key     | key_len | ref  | rows | Extra                       |
+----+-------------+----------+-------+---------------+---------+---------+------    +------+-----------------------------+
|  1 | SIMPLE      | readings | range | PRIMARY,time  | PRIMARY | 22      | NULL | 7339 | Using where; Using filesort |
+----+-------------+----------+-------+---------------+---------+---------+------+------+-----------------------------+

+----+--------------------+----------+-----------------+----------------+---------+---------+------+---------+-------------+
| id | select_type        | table    | type            | possible_keys  | key     | key_len | ref  | rows    | Extra       |
+----+--------------------+----------+-----------------+----------------+---------+---------+------+---------+-------------+
|  1 | PRIMARY            | readings | range           | time           | time    | 4       | NULL | 6353234 | Using where |
|  2 | DEPENDENT SUBQUERY | boards   | unique_subquery | PRIMARY,siteId | PRIMARY | 18      | func |       1 | Using where |
+----+--------------------+----------+-----------------+----------------+---------+---------+------+---------+-------------+
Run Code Online (Sandbox Code Playgroud)

出于某种原因,带有子选择的查询没有使用主键。我尝试使用 USE INDEX,但这实际上使它花费了更长的时间。

读数表具有 PRIMARY KEY SerialNumber,带有时间索引的时间。
板表具有主键 ID(序列号)和 siteId 上的索引。

我使用的 MySQL 版本是 5.5.8-log MySQL Community Server (GPL)

我只是想知道为什么两个查询的性能不是很相似。谢谢。

更新:以下是创建表语句:

mysql> SHOW CREATE TABLE readings\G
*************************** 1. row ***************************
       Table: readings
Create Table: CREATE TABLE `readings` (
  `time` int(11) NOT NULL,
  `boxsn` varchar(16) NOT NULL,
  `rev` varchar(16) NOT NULL,
  `schema` tinyint(3) unsigned NOT NULL,
  `interval` smallint(5) unsigned NOT NULL,
  `relay` tinyint(4) NOT NULL,
  `inputV` decimal(10,6) NOT NULL,
  `inputA` decimal(10,6) NOT NULL,
  `outputV` decimal(10,6) NOT NULL,
  `outputA` decimal(10,6) NOT NULL,
  `phase` tinyint(4) NOT NULL,
  `outputVA` decimal(10,6) NOT NULL,
  `watts` decimal(10,6) NOT NULL DEFAULT '0.000000',
  `var` decimal(10,6) NOT NULL,
  `kiloVAHours` decimal(9,9) DEFAULT '0.000000000',
  `kilowattHours` decimal(9,9) NOT NULL,
  `kilovarHours` decimal(9,9) NOT NULL,
  PRIMARY KEY (`boxsn`,`time`),
  KEY `time` (`time`),
  KEY `boxsn_time_ndx` (`boxsn`,`time`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
1 row in set (0.00 sec)

mysql> SHOW CREATE TABLE boards\G
*************************** 1. row ***************************
       Table: boards
Create Table: CREATE TABLE `boards` (
  `id` varchar(16) NOT NULL,
  `siteId` int(11) NOT NULL,
  `groupId` int(11) DEFAULT '0',
  `lastReport` int(11) DEFAULT NULL,
  `lastIp` varchar(15) DEFAULT '0.0.0.0',
  `label` varchar(24) DEFAULT '',
  PRIMARY KEY (`id`),
  KEY `siteId` (`siteId`),
  KEY `siteId_id_ndx` (`siteId`,`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 ROW_FORMAT=DYNAMIC
Run Code Online (Sandbox Code Playgroud)

Rol*_*DBA 6

重构查询如下:

SELECT
    readings.*
FROM
    (
        SELECT boxsn FROM readings
        WHERE (time >= 1325404800) 
        AND (time < 1326317400) 
        ORDER BY `time` ASC
    ) readings_keys
    LEFT JOIN
    (
        SELECT id AS boxsn FROM boards WHERE siteId = '1'
    ) boards
    USING (boxsn)
    LEFT JOIN readings
    USING (boxsn)
;
Run Code Online (Sandbox Code Playgroud)

确保您有以下索引:

ALTER TABLE boards ADD INDEX siteId_id_ndx (siteId,id);
ALTER TABLE readings ADD INDEX time_boxsn_ndx (time,boxsn);
Run Code Online (Sandbox Code Playgroud)

您可以删除其他索引

ALTER TABLE readings DROP INDEX boxsn_time_ndx;
Run Code Online (Sandbox Code Playgroud)

随着表的增长,您肯定会看到性能的显着提高。

在你的情况下,

  • 第一个 EXPLAIN 计划说您必须readings针对内存中的值列表为每一行执行 SerialNumber 的查找
  • 第二个 EXPLAIN 计划说您必须readings针对表中的每一行执行 SerialNumber 的查找。

更新 2012-01-12 14:03 EDT

我再次重构它以确保在从表中检索数据之前正确组合readings键和boardsreadings

SELECT 
    readings.* 
FROM 
    ( 
        SELECT A.* FROM
        (
            SELECT boxsn FROM readings 
            WHERE (time >= 1325404800)  
            AND (time < 1326317400)  
            ORDER BY `time` ASC
        ) A
        LEFT JOIN
        (
            SELECT id AS boxsn
            FROM boards
            WHERE siteId = '1'
        ) B
        USING (boxsn)
        WHERE B.boxsn IS NOT NULL
    ) readings_keys 
    LEFT JOIN readings 
    USING (boxsn) 
; 
Run Code Online (Sandbox Code Playgroud)