如何优化包含“IN”和“GROUP BY”的多个派生表的查询?

Thi*_*Not 5 mariadb performance query-performance

我每五分钟收集一次nmap数据并将其存储在数据库中。有关每次扫描的信息(例如开始和结束时间)存储在scans表中:

CREATE TABLE `scans` (
  `scan_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `start_time` datetime NOT NULL,
  `end_time` datetime NOT NULL,
  `nmap_version` varchar(20) DEFAULT NULL,
  `nmap_args` varchar(255) DEFAULT NULL,
  PRIMARY KEY (`scan_id`)
) ENGINE=InnoDB AUTO_INCREMENT=34901 DEFAULT CHARSET=utf8
Run Code Online (Sandbox Code Playgroud)

有关扫描的主机的信息(例如主机名、MAC 地址)存储在表中hosts

CREATE TABLE `hosts` (
  `scan_id` int(10) unsigned NOT NULL,
  `host_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `hostname` varchar(255) NOT NULL,
  `ip_address` int(10) unsigned NOT NULL,
  `mac_address` bigint(20) unsigned DEFAULT NULL,
  `mac_vendor` varchar(255) DEFAULT NULL,
  `status` varchar(20) NOT NULL,
  `hops` int(10) unsigned DEFAULT NULL,
  `last_boot` datetime DEFAULT NULL,
  PRIMARY KEY (`host_id`),
  KEY `scan_id` (`scan_id`),
  KEY `idx_status` (`status`),
  KEY `idx_hostname` (`hostname`),
  CONSTRAINT `hosts_ibfk_1` FOREIGN KEY (`scan_id`) REFERENCES `scans` (`scan_id`) ON DELETE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=2262995 DEFAULT CHARSET=utf8
Run Code Online (Sandbox Code Playgroud)

现在我想从一台或多台(最多大约一千台)主机的最近扫描中获取信息。所有主机的最新扫描不一定都相同。我还想获取每个主机上次启动的时间。

我一直在使用以下查询,但速度很慢(获取三个主机的数据大约需要六秒钟):

SELECT hosts.hostname,
       INET_NTOA(hosts.ip_address) AS ip,
       CONV(hosts.mac_address, 10, 16) AS mac,
       hosts.mac_vendor AS mac_vendor,
       hosts.status AS status,
       scans.start_time AS last_scan,
       u.last_seen AS last_seen
FROM hosts
JOIN (
       -- ID of most recent scan for each host
       SELECT MAX(hosts.scan_id) AS max_scan_id,
              hosts.hostname
       FROM hosts
       WHERE hosts.hostname IN ('foo', 'bar', 'baz')
       GROUP BY hosts.hostname
     ) AS t
  ON (hosts.scan_id = t.max_scan_id AND hosts.hostname = t.hostname)
JOIN scans
  ON scans.scan_id = t.max_scan_id
JOIN (
       -- Last time each host was up
       SELECT MAX(scans.start_time) AS last_seen,
              hosts.hostname
       FROM scans
       JOIN hosts
         ON scans.scan_id = hosts.scan_id
       WHERE hosts.status = 'up'
       GROUP BY hosts.hostname
     ) AS u
  ON hosts.hostname = u.hostname
Run Code Online (Sandbox Code Playgroud)

EXPLAIN图显示派生查询之一正在执行表扫描:

+------+-------------+------------+--------+----------------------+--------------+---------+--------------------+---------+---------------------------------------------------------------------+
| id   | select_type | table      | type   | possible_keys        | key          | key_len | ref                | rows    | Extra                                                               |
+------+-------------+------------+--------+----------------------+--------------+---------+--------------------+---------+---------------------------------------------------------------------+
|    1 | PRIMARY     | <derived2> | ALL    | NULL                 | NULL         | NULL    | NULL               |   12855 | Using where                                                         |
|    1 | PRIMARY     | scans      | eq_ref | PRIMARY              | PRIMARY      | 4       | t.max_scan_id      |       1 |                                                                     |
|    1 | PRIMARY     | hosts      | ref    | scan_id,idx_hostname | scan_id      | 4       | t.max_scan_id      |      81 | Using where                                                         |
|    1 | PRIMARY     | <derived3> | ref    | key0                 | key0         | 767     | t.hostname         |      10 |                                                                     |
|    3 | DERIVED     | hosts      | ref    | scan_id,idx_status   | idx_status   | 62      | const              | 1136083 | Using index condition; Using where; Using temporary; Using filesort |
|    3 | DERIVED     | scans      | eq_ref | PRIMARY              | PRIMARY      | 4       | wmap.hosts.scan_id |       1 |                                                                     |
|    2 | DERIVED     | hosts      | range  | idx_hostname         | idx_hostname | 767     | NULL               |   12855 | Using index condition                                               |
+------+-------------+------------+--------+----------------------+--------------+---------+--------------------+---------+---------------------------------------------------------------------+
Run Code Online (Sandbox Code Playgroud)

我尝试将该IN子句添加到外部查询和第二个子查询(IN总共三个子句),这在搜索单个主机时效果很好,但实际上在搜索大量主机时性能比原始查询差。

我还尝试创建一个临时表来仅保存要匹配的主机名,并创建一个临时表来保存每个子查询的结果:

CREATE TEMPORARY TABLE hosts_to_match (
    hostname VARCHAR(255) NOT NULL PRIMARY KEY
);

INSERT INTO hosts_to_match
VALUES ('foo'), ('bar'), ('baz');

CREATE TEMPORARY TABLE last_scan (
    hostname VARCHAR(255) NOT NULL PRIMARY KEY, 
    scan_id INT(10) UNSIGNED NOT NULL
);

INSERT INTO last_scan (
    scan_id, 
    hostname
) 
SELECT MAX(hosts.scan_id),
       hosts.hostname 
FROM hosts_to_match 
JOIN hosts 
  ON hosts_to_match.hostname = hosts.hostname 
GROUP BY hosts.hostname;

CREATE TEMPORARY TABLE last_seen (
    hostname VARCHAR(255) NOT NULL PRIMARY KEY,
    last_seen DATETIME NOT NULL
);

INSERT INTO last_seen (
    last_seen, 
    hostname
) 
SELECT MAX(scans.start_time),
       hosts.hostname 
FROM hosts_to_match 
JOIN hosts 
  ON hosts_to_match.hostname = hosts.hostname 
JOIN scans 
  ON scans.scan_id = hosts.scan_id 
WHERE hosts.status = 'UP' 
GROUP BY hosts.hostname;

SELECT hosts.hostname, 
       INET_NTOA(hosts.ip_address) AS ip,
       CONV(hosts.mac_address, 10, 16) AS mac, 
       hosts.mac_vendor,
       hosts.status,
       scans.start_time,
       last_seen.last_seen
FROM last_scan
JOIN hosts
  ON (last_scan.hostname = hosts.hostname AND last_scan.scan_id = hosts.scan_id)
JOIN scans
  ON scans.scan_id = last_scan.scan_id
JOIN last_seen
  ON hosts.hostname = last_seen.hostname;
Run Code Online (Sandbox Code Playgroud)

但对于大量主机来说,填充临时表的速度很慢last_scanlast_seen

EXPLAIN用于SELECT填充last_scan

+------+-------------+----------------+-------+---------------+--------------+---------+------------------------------+------+----------------------------------------------+
| id   | select_type | table          | type  | possible_keys | key          | key_len | ref                          | rows | Extra                                        |
+------+-------------+----------------+-------+---------------+--------------+---------+------------------------------+------+----------------------------------------------+
|    1 | SIMPLE      | hosts_to_match | index | PRIMARY       | PRIMARY      | 767     | NULL                         |  166 | Using index; Using temporary; Using filesort |
|    1 | SIMPLE      | hosts          | ref   | idx_hostname  | idx_hostname | 767     | wmap.hosts_to_match.hostname | 2573 |                                              |
+------+-------------+----------------+-------+---------------+--------------+---------+------------------------------+------+----------------------------------------------+
Run Code Online (Sandbox Code Playgroud)

EXPLAIN用于SELECT填充last_seen

+------+-------------+----------------+--------+---------------------------------+--------------+---------+------------------------------+------+----------------------------------------------+
| id   | select_type | table          | type   | possible_keys                   | key          | key_len | ref                          | rows | Extra                                        |
+------+-------------+----------------+--------+---------------------------------+--------------+---------+------------------------------+------+----------------------------------------------+
|    1 | SIMPLE      | hosts_to_match | index  | PRIMARY                         | PRIMARY      | 767     | NULL                         |  166 | Using index; Using temporary; Using filesort |
|    1 | SIMPLE      | hosts          | ref    | scan_id,idx_status,idx_hostname | idx_hostname | 767     | wmap.hosts_to_match.hostname | 2573 | Using where                                  |
|    1 | SIMPLE      | scans          | eq_ref | PRIMARY                         | PRIMARY      | 4       | wmap.hosts.scan_id           |    1 |                                              |
+------+-------------+----------------+--------+---------------------------------+--------------+---------+------------------------------+------+----------------------------------------------+
Run Code Online (Sandbox Code Playgroud)

我怎样才能加快速度?我正在使用 MariaDB 5.5.47。

Eze*_*nay 3

hosts (hostname, scan_id)您确实可以从该查询的索引中受益,也可能包括另一个索引status(特别是对于下面的第二个查询)。您的查询还可能受益于将一些联接转移到每行总计:

CREATE INDEX idx_hostname_scanid ON hosts (hostname, scan_id);
CREATE INDEX idx_hostname_status_scanid ON hosts (hostname, status, scan_id);

SELECT hosts.hostname,
       INET_NTOA(hosts.ip_address) AS ip,
       CONV(hosts.mac_address, 10, 16) AS mac,
       hosts.mac_vendor AS mac_vendor,
       hosts.status AS status,
       scans.start_time AS last_scan,
       (SELECT MAX(scans.start_time)
        FROM hosts
        JOIN scans ON (scans.scan_id = hosts.scan_id)
        WHERE hosts.hostname = t.hostname AND hosts.status = 'up') AS last_seen
FROM (
       -- ID of most recent scan for each host
       SELECT MAX(hosts.scan_id) AS max_scan_id, hosts.hostname
       FROM hosts
       WHERE hosts.hostname IN ('foo', 'bar', 'baz')
       GROUP BY hosts.hostname
     ) t
JOIN hosts ON (hosts.hostname = t.hostname AND hosts.scan_id = t.max_scan_id)
JOIN scans ON (scans.scan_id = t.max_scan_id);
Run Code Online (Sandbox Code Playgroud)

last_seen另外,考虑到您已经相信最后一次扫描是具有最高 id 的扫描,您可以通过相信时间是具有最高 id 的扫描来加速查询:

CREATE INDEX idx_hostname_scanid ON hosts (hostname, scan_id);
CREATE INDEX idx_hostname_status_scanid ON hosts (hostname, status, scan_id);

SELECT hosts.hostname,
       INET_NTOA(hosts.ip_address) AS ip,
       CONV(hosts.mac_address, 10, 16) AS mac,
       hosts.mac_vendor AS mac_vendor,
       hosts.status AS status,
       scans.start_time AS last_scan,
       lss.start_time AS last_seen
FROM (
       -- ID of most recent scan for each host
       SELECT MAX(hosts.scan_id) AS max_scan_id, hosts.hostname
       FROM hosts
       WHERE hosts.hostname IN ('foo', 'bar', 'baz')
       GROUP BY hosts.hostname
     ) t
JOIN hosts ON (hosts.hostname = t.hostname AND hosts.scan_id = t.max_scan_id)
JOIN scans ON (scans.scan_id = t.max_scan_id)
LEFT JOIN (
       SELECT MAX(hosts.scan_id) AS max_scan_id, hosts.hostname
       FROM hosts
       WHERE hosts.hostname IN ('foo', 'bar', 'baz') AND hosts.status = 'up'
       GROUP BY hosts.hostname
     ) ls ON (ls.hostname = t.hostname)
LEFT JOIN scans lss ON (lss.scan_id = ls.max_scan_id);
Run Code Online (Sandbox Code Playgroud)