Tin*_*lon 1 mysql performance query-optimization large-data
我正在尝试运行我认为对相当大的数据集进行简单查询,并且需要很长时间才能执行 - 它在"发送数据"状态中停留3-4小时或更长时间.
该表如下所示:
CREATE TABLE `transaction` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`uuid` varchar(36) NOT NULL,
`userId` varchar(64) NOT NULL,
`protocol` int(11) NOT NULL,
... A few other fields: ints and small varchars
`created` datetime NOT NULL,
PRIMARY KEY (`id`),
KEY `uuid` (`uuid`),
KEY `userId` (`userId`),
KEY `protocol` (`protocol`),
KEY `created` (`created`)
) ENGINE=InnoDB AUTO_INCREMENT=61 DEFAULT CHARSET=utf8 ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=4 COMMENT='Transaction audit table'
Run Code Online (Sandbox Code Playgroud)
查询在这里:
select protocol, count(distinct userId) as count from transaction
where created > '2012-01-15 23:59:59' and created <= '2012-02-14 23:59:59'
group by protocol;
Run Code Online (Sandbox Code Playgroud)
该表有大约2.22亿行,查询中的where子句过滤到大约2000万行.distinct选项会将其降低到大约700,000个不同的行,然后在分组后(以及查询最终完成时),实际返回4到5行.
我意识到这是很多数据,但似乎4-5小时是这个查询非常长的时间.
谢谢.
编辑:作为参考,这是在db.m2.4xlarge RDS数据库实例上的AWS上运行.
Mah*_*til 11
为什么不分析查询并查看究竟发生了什么?
SET PROFILING = 1;
SET profiling_history_size = 0;
SET profiling_history_size = 15;
/* Your query should be here */
SHOW PROFILES;
SELECT state, ROUND(SUM(duration),5) AS `duration (summed) in sec` FROM information_schema.profiling WHERE query_id = 3 GROUP BY state ORDER BY `duration (summed) in sec` DESC;
SET PROFILING = 0;
EXPLAIN /* Your query again should appear here */;
Run Code Online (Sandbox Code Playgroud)
我认为这将帮助您查看确切查询需要时间的位置,并根据结果执行优化操作.