Thi*_*Not 5 mariadb performance query-performance
我每五分钟收集一次nmap数据并将其存储在数据库中。有关每次扫描的信息(例如开始和结束时间)存储在scans表中:
CREATE TABLE `scans` (
`scan_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`start_time` datetime NOT NULL,
`end_time` datetime NOT NULL,
`nmap_version` varchar(20) DEFAULT NULL,
`nmap_args` varchar(255) DEFAULT NULL,
PRIMARY KEY (`scan_id`)
) ENGINE=InnoDB AUTO_INCREMENT=34901 DEFAULT CHARSET=utf8
Run Code Online (Sandbox Code Playgroud)
有关扫描的主机的信息(例如主机名、MAC 地址)存储在表中hosts:
CREATE TABLE `hosts` (
`scan_id` int(10) unsigned NOT NULL,
`host_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`hostname` varchar(255) NOT NULL,
`ip_address` int(10) unsigned NOT NULL,
`mac_address` bigint(20) unsigned DEFAULT NULL,
`mac_vendor` varchar(255) DEFAULT NULL,
`status` varchar(20) NOT NULL,
`hops` int(10) unsigned DEFAULT NULL,
`last_boot` datetime DEFAULT NULL,
PRIMARY KEY (`host_id`),
KEY `scan_id` (`scan_id`),
KEY `idx_status` (`status`),
KEY `idx_hostname` (`hostname`),
CONSTRAINT `hosts_ibfk_1` FOREIGN KEY (`scan_id`) REFERENCES `scans` (`scan_id`) ON DELETE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=2262995 DEFAULT CHARSET=utf8
Run Code Online (Sandbox Code Playgroud)
现在我想从一台或多台(最多大约一千台)主机的最近扫描中获取信息。所有主机的最新扫描不一定都相同。我还想获取每个主机上次启动的时间。
我一直在使用以下查询,但速度很慢(获取三个主机的数据大约需要六秒钟):
SELECT hosts.hostname,
INET_NTOA(hosts.ip_address) AS ip,
CONV(hosts.mac_address, 10, 16) AS mac,
hosts.mac_vendor AS mac_vendor,
hosts.status AS status,
scans.start_time AS last_scan,
u.last_seen AS last_seen
FROM hosts
JOIN (
-- ID of most recent scan for each host
SELECT MAX(hosts.scan_id) AS max_scan_id,
hosts.hostname
FROM hosts
WHERE hosts.hostname IN ('foo', 'bar', 'baz')
GROUP BY hosts.hostname
) AS t
ON (hosts.scan_id = t.max_scan_id AND hosts.hostname = t.hostname)
JOIN scans
ON scans.scan_id = t.max_scan_id
JOIN (
-- Last time each host was up
SELECT MAX(scans.start_time) AS last_seen,
hosts.hostname
FROM scans
JOIN hosts
ON scans.scan_id = hosts.scan_id
WHERE hosts.status = 'up'
GROUP BY hosts.hostname
) AS u
ON hosts.hostname = u.hostname
Run Code Online (Sandbox Code Playgroud)
该EXPLAIN图显示派生查询之一正在执行表扫描:
+------+-------------+------------+--------+----------------------+--------------+---------+--------------------+---------+---------------------------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+------------+--------+----------------------+--------------+---------+--------------------+---------+---------------------------------------------------------------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 12855 | Using where |
| 1 | PRIMARY | scans | eq_ref | PRIMARY | PRIMARY | 4 | t.max_scan_id | 1 | |
| 1 | PRIMARY | hosts | ref | scan_id,idx_hostname | scan_id | 4 | t.max_scan_id | 81 | Using where |
| 1 | PRIMARY | <derived3> | ref | key0 | key0 | 767 | t.hostname | 10 | |
| 3 | DERIVED | hosts | ref | scan_id,idx_status | idx_status | 62 | const | 1136083 | Using index condition; Using where; Using temporary; Using filesort |
| 3 | DERIVED | scans | eq_ref | PRIMARY | PRIMARY | 4 | wmap.hosts.scan_id | 1 | |
| 2 | DERIVED | hosts | range | idx_hostname | idx_hostname | 767 | NULL | 12855 | Using index condition |
+------+-------------+------------+--------+----------------------+--------------+---------+--------------------+---------+---------------------------------------------------------------------+
Run Code Online (Sandbox Code Playgroud)
我尝试将该IN子句添加到外部查询和第二个子查询(IN总共三个子句),这在搜索单个主机时效果很好,但实际上在搜索大量主机时性能比原始查询差。
我还尝试创建一个临时表来仅保存要匹配的主机名,并创建一个临时表来保存每个子查询的结果:
CREATE TEMPORARY TABLE hosts_to_match (
hostname VARCHAR(255) NOT NULL PRIMARY KEY
);
INSERT INTO hosts_to_match
VALUES ('foo'), ('bar'), ('baz');
CREATE TEMPORARY TABLE last_scan (
hostname VARCHAR(255) NOT NULL PRIMARY KEY,
scan_id INT(10) UNSIGNED NOT NULL
);
INSERT INTO last_scan (
scan_id,
hostname
)
SELECT MAX(hosts.scan_id),
hosts.hostname
FROM hosts_to_match
JOIN hosts
ON hosts_to_match.hostname = hosts.hostname
GROUP BY hosts.hostname;
CREATE TEMPORARY TABLE last_seen (
hostname VARCHAR(255) NOT NULL PRIMARY KEY,
last_seen DATETIME NOT NULL
);
INSERT INTO last_seen (
last_seen,
hostname
)
SELECT MAX(scans.start_time),
hosts.hostname
FROM hosts_to_match
JOIN hosts
ON hosts_to_match.hostname = hosts.hostname
JOIN scans
ON scans.scan_id = hosts.scan_id
WHERE hosts.status = 'UP'
GROUP BY hosts.hostname;
SELECT hosts.hostname,
INET_NTOA(hosts.ip_address) AS ip,
CONV(hosts.mac_address, 10, 16) AS mac,
hosts.mac_vendor,
hosts.status,
scans.start_time,
last_seen.last_seen
FROM last_scan
JOIN hosts
ON (last_scan.hostname = hosts.hostname AND last_scan.scan_id = hosts.scan_id)
JOIN scans
ON scans.scan_id = last_scan.scan_id
JOIN last_seen
ON hosts.hostname = last_seen.hostname;
Run Code Online (Sandbox Code Playgroud)
但对于大量主机来说,填充临时表的速度很慢last_scan。last_seen
EXPLAIN用于SELECT填充last_scan:
+------+-------------+----------------+-------+---------------+--------------+---------+------------------------------+------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+----------------+-------+---------------+--------------+---------+------------------------------+------+----------------------------------------------+
| 1 | SIMPLE | hosts_to_match | index | PRIMARY | PRIMARY | 767 | NULL | 166 | Using index; Using temporary; Using filesort |
| 1 | SIMPLE | hosts | ref | idx_hostname | idx_hostname | 767 | wmap.hosts_to_match.hostname | 2573 | |
+------+-------------+----------------+-------+---------------+--------------+---------+------------------------------+------+----------------------------------------------+
Run Code Online (Sandbox Code Playgroud)
EXPLAIN用于SELECT填充last_seen:
+------+-------------+----------------+--------+---------------------------------+--------------+---------+------------------------------+------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+----------------+--------+---------------------------------+--------------+---------+------------------------------+------+----------------------------------------------+
| 1 | SIMPLE | hosts_to_match | index | PRIMARY | PRIMARY | 767 | NULL | 166 | Using index; Using temporary; Using filesort |
| 1 | SIMPLE | hosts | ref | scan_id,idx_status,idx_hostname | idx_hostname | 767 | wmap.hosts_to_match.hostname | 2573 | Using where |
| 1 | SIMPLE | scans | eq_ref | PRIMARY | PRIMARY | 4 | wmap.hosts.scan_id | 1 | |
+------+-------------+----------------+--------+---------------------------------+--------------+---------+------------------------------+------+----------------------------------------------+
Run Code Online (Sandbox Code Playgroud)
我怎样才能加快速度?我正在使用 MariaDB 5.5.47。
hosts (hostname, scan_id)您确实可以从该查询的索引中受益,也可能包括另一个索引status(特别是对于下面的第二个查询)。您的查询还可能受益于将一些联接转移到每行总计:
CREATE INDEX idx_hostname_scanid ON hosts (hostname, scan_id);
CREATE INDEX idx_hostname_status_scanid ON hosts (hostname, status, scan_id);
SELECT hosts.hostname,
INET_NTOA(hosts.ip_address) AS ip,
CONV(hosts.mac_address, 10, 16) AS mac,
hosts.mac_vendor AS mac_vendor,
hosts.status AS status,
scans.start_time AS last_scan,
(SELECT MAX(scans.start_time)
FROM hosts
JOIN scans ON (scans.scan_id = hosts.scan_id)
WHERE hosts.hostname = t.hostname AND hosts.status = 'up') AS last_seen
FROM (
-- ID of most recent scan for each host
SELECT MAX(hosts.scan_id) AS max_scan_id, hosts.hostname
FROM hosts
WHERE hosts.hostname IN ('foo', 'bar', 'baz')
GROUP BY hosts.hostname
) t
JOIN hosts ON (hosts.hostname = t.hostname AND hosts.scan_id = t.max_scan_id)
JOIN scans ON (scans.scan_id = t.max_scan_id);
Run Code Online (Sandbox Code Playgroud)
last_seen另外,考虑到您已经相信最后一次扫描是具有最高 id 的扫描,您可以通过相信时间是具有最高 id 的扫描来加速查询:
CREATE INDEX idx_hostname_scanid ON hosts (hostname, scan_id);
CREATE INDEX idx_hostname_status_scanid ON hosts (hostname, status, scan_id);
SELECT hosts.hostname,
INET_NTOA(hosts.ip_address) AS ip,
CONV(hosts.mac_address, 10, 16) AS mac,
hosts.mac_vendor AS mac_vendor,
hosts.status AS status,
scans.start_time AS last_scan,
lss.start_time AS last_seen
FROM (
-- ID of most recent scan for each host
SELECT MAX(hosts.scan_id) AS max_scan_id, hosts.hostname
FROM hosts
WHERE hosts.hostname IN ('foo', 'bar', 'baz')
GROUP BY hosts.hostname
) t
JOIN hosts ON (hosts.hostname = t.hostname AND hosts.scan_id = t.max_scan_id)
JOIN scans ON (scans.scan_id = t.max_scan_id)
LEFT JOIN (
SELECT MAX(hosts.scan_id) AS max_scan_id, hosts.hostname
FROM hosts
WHERE hosts.hostname IN ('foo', 'bar', 'baz') AND hosts.status = 'up'
GROUP BY hosts.hostname
) ls ON (ls.hostname = t.hostname)
LEFT JOIN scans lss ON (lss.scan_id = ls.max_scan_id);
Run Code Online (Sandbox Code Playgroud)