在网站上,我使用django提出一些请求:
django线:
CINodeInventory.objects.select_related().filter(ci_class__type='equipment',company__slug=self.kwargs['company'])
Run Code Online (Sandbox Code Playgroud)
生成一个MySQL查询,如下所示:
SELECT *
FROM `inventory_cinodeinventory`
INNER JOIN `ci_cinodeclass` ON ( `inventory_cinodeinventory`.`ci_class_id` = `ci_cinodeclass`.`class_name` )
INNER JOIN `accounts_companyprofile` ON ( `inventory_cinodeinventory`.`company_id` = `accounts_companyprofile`.`slug` )
INNER JOIN `accounts_companysite` ON ( `inventory_cinodeinventory`.`company_site_id` = `accounts_companysite`.`slug` )
INNER JOIN `accounts_companyprofile` T5 ON ( `accounts_companysite`.`company_id` = T5.`slug` )
WHERE (
`ci_cinodeclass`.`type` = 'equipment'
AND `inventory_cinodeinventory`.`company_id` = 'thecompany'
)
ORDER BY `inventory_cinodeinventory`.`name` ASC
Run Code Online (Sandbox Code Playgroud)
问题是主表中只有40 000个条目,处理需要0.5秒.
我检查了所有索引,创建了排序或加入所需的索引:我仍然有问题.
有趣的是,如果我用LEFT JOIN替换最后一个INNER JOIN,请求速度要快10倍!不幸的是,由于我使用django进行请求,我无法访问它生成的SQL请求(我不想自己做原始SQL).
作为"INNER JOIN"的最后一次加入,EXPLAIN给出:
+----+-------------+---------------------------+--------+----------------------------------------------------------------------------------------------------------+------------------------------------+---------+------------------------------------------------+-------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------------------------+--------+----------------------------------------------------------------------------------------------------------+------------------------------------+---------+------------------------------------------------+-------+---------------------------------+
| 1 | SIMPLE | accounts_companyprofile | const | PRIMARY | PRIMARY | 152 | const | 1 | Using temporary; Using filesort |
| 1 | SIMPLE | inventory_cinodeinventory | range | inventory_cinodeinventory_41ddcf59,inventory_cinodeinventory_543518c6,inventory_cinodeinventory_14fe63e9 | inventory_cinodeinventory_543518c6 | 152 | NULL | 42129 | Using where |
| 1 | SIMPLE | T5 | ALL | PRIMARY | NULL | NULL | NULL | 3 | Using join buffer |
| 1 | SIMPLE | accounts_companysite | eq_ref | PRIMARY,accounts_companysite_543518c6 | PRIMARY | 152 | cidb.inventory_cinodeinventory.company_site_id | 1 | Using where |
| 1 | SIMPLE | ci_cinodeclass | eq_ref | PRIMARY | PRIMARY | 92 | cidb.inventory_cinodeinventory.ci_class_id | 1 | Using where |
+----+-------------+---------------------------+--------+----------------------------------------------------------------------------------------------------------+------------------------------------+---------+------------------------------------------------+-------+---------------------------------+
Run Code Online (Sandbox Code Playgroud)
对于最后一次加入"LEFT JOIN",我得到了:
+----+-------------+---------------------------+--------+----------------------------------------------------------------------------------------------------------+---------+---------+------------------------------------------------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------------------------+--------+----------------------------------------------------------------------------------------------------------+---------+---------+------------------------------------------------+------+-------------+
| 1 | SIMPLE | accounts_companyprofile | const | PRIMARY | PRIMARY | 152 | const | 1 | |
| 1 | SIMPLE | inventory_cinodeinventory | index | inventory_cinodeinventory_41ddcf59,inventory_cinodeinventory_543518c6,inventory_cinodeinventory_14fe63e9 | name | 194 | NULL | 173 | Using where |
| 1 | SIMPLE | accounts_companysite | eq_ref | PRIMARY | PRIMARY | 152 | cidb.inventory_cinodeinventory.company_site_id | 1 | |
| 1 | SIMPLE | T5 | eq_ref | PRIMARY | PRIMARY | 152 | cidb.accounts_companysite.company_id | 1 | |
| 1 | SIMPLE | ci_cinodeclass | eq_ref | PRIMARY | PRIMARY | 92 | cidb.inventory_cinodeinventory.ci_class_id | 1 | Using where |
+----+-------------+---------------------------+--------+----------------------------------------------------------------------------------------------------------+---------+---------+------------------------------------------------+------+-------------+
Run Code Online (Sandbox Code Playgroud)
似乎对于"INNER JOIN"的情况,MySQL没有为T5连接找到任何索引:为什么?
分析给出:
starting 0.000011
checking query cache for query 0.000086
Opening tables 0.000014
System lock 0.000005
Table lock 0.000052
init 0.000064
optimizing 0.000021
statistics 0.000180
preparing 0.000024
Creating tmp table 0.000308
executing 0.000003
Copying to tmp table 0.353414 !!!
Sorting result 0.037244
Sending data 0.035168
end 0.000005
removing tmp table 0.550974 !!!
end 0.000009
query end 0.000003
freeing items 0.000113
storing result in query cache 0.000009
logging slow query 0.000002
cleaning up 0.000004
Run Code Online (Sandbox Code Playgroud)
所以看来,有一个步骤,mysql使用临时表.LEFT JOIN不会发生此步骤,仅使用INNER JOIN.我试图通过在查询中包含"强制索引加入"来避免这种情况,但它没有帮助...
连接表是:
CREATE TABLE IF NOT EXISTS `accounts_companysite` (
`slug` varchar(50) NOT NULL,
`created` datetime NOT NULL,
`modified` datetime NOT NULL,
`deleted` tinyint(1) NOT NULL,
`company_id` varchar(50) NOT NULL,
`name` varchar(128) NOT NULL,
`address` longtext NOT NULL,
`city` varchar(64) NOT NULL,
`zip_code` varchar(6) NOT NULL,
`state` varchar(32) NOT NULL,
`country` varchar(2) DEFAULT NULL,
`phone` varchar(20) NOT NULL,
`fax` varchar(20) NOT NULL,
`more` longtext NOT NULL,
PRIMARY KEY (`slug`),
KEY `accounts_companysite_543518c6` (`company_id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
CREATE TABLE IF NOT EXISTS `accounts_companyprofile` (
`slug` varchar(50) NOT NULL,
`created` datetime NOT NULL,
`modified` datetime NOT NULL,
`deleted` tinyint(1) NOT NULL,
`name` varchar(128) NOT NULL,
`address` longtext NOT NULL,
`city` varchar(64) NOT NULL,
`zip_code` varchar(6) NOT NULL,
`state` varchar(32) NOT NULL,
`country` varchar(2) DEFAULT NULL,
`phone` varchar(20) NOT NULL,
`fax` varchar(20) NOT NULL,
`contract_id` varchar(32) NOT NULL,
`more` longtext NOT NULL,
PRIMARY KEY (`slug`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
CREATE TABLE IF NOT EXISTS `inventory_cinodeinventory` (
`uuid` varchar(36) NOT NULL,
`name` varchar(64) NOT NULL,
`synopsis` varchar(64) NOT NULL,
`path` varchar(255) NOT NULL,
`created` datetime NOT NULL,
`modified` datetime NOT NULL,
`deleted` tinyint(1) NOT NULL,
`root_id` varchar(36) DEFAULT NULL,
`parent_id` varchar(36) DEFAULT NULL,
`order` int(11) NOT NULL,
`ci_class_id` varchar(30) NOT NULL,
`data` longtext NOT NULL,
`serial` varchar(64) NOT NULL,
`company_id` varchar(50) NOT NULL,
`company_site_id` varchar(50) NOT NULL,
`vendor` varchar(48) NOT NULL,
`type` varchar(64) NOT NULL,
`model` varchar(64) NOT NULL,
`room` varchar(30) NOT NULL,
`rack` varchar(30) NOT NULL,
`rack_slot` varchar(30) NOT NULL,
PRIMARY KEY (`uuid`),
KEY `inventory_cinodeinventory_1fb5ff88` (`root_id`),
KEY `inventory_cinodeinventory_63f17a16` (`parent_id`),
KEY `inventory_cinodeinventory_41ddcf59` (`ci_class_id`),
KEY `inventory_cinodeinventory_543518c6` (`company_id`),
KEY `inventory_cinodeinventory_14fe63e9` (`company_site_id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
Run Code Online (Sandbox Code Playgroud)
我还尝试通过添加my.cnf来调整MySQL:
join_buffer_size = 16M
tmp_table_size = 160M
max_seeks_for_key = 100
Run Code Online (Sandbox Code Playgroud)
......但它没有帮助.
使用django,很容易使用Postgresql而不是Mysql,所以我尝试了:在db中使用相同的查询和相同的数据,postgres比使用INNER JOIN更快的速度更快Mysql:x10更快(分析表明它使用的索引与Mysql不同)
你知道为什么我的MySQL INNER JOIN这么慢吗?
编辑1:
经过一些测试,我将问题减少到这个请求:
SELECT *
FROM `inventory_cinodeinventory`
INNER JOIN `accounts_companyprofile` ON `inventory_cinodeinventory`.`company_id` = `accounts_companyprofile`.`slug`
ORDER BY `inventory_cinodeinventory`.`name` ASC
Run Code Online (Sandbox Code Playgroud)
这个请求很慢,我不明白为什么.没有'ORDER BY'子句,它很快,但没有它,但是,名称索引设置:
CREATE TABLE IF NOT EXISTS `inventory_cinodeinventory` (
`uuid` varchar(36) NOT NULL,
`name` varchar(64) NOT NULL,
`synopsis` varchar(64) NOT NULL,
`path` varchar(255) NOT NULL,
`created` datetime NOT NULL,
`modified` datetime NOT NULL,
`deleted` tinyint(1) NOT NULL,
`root_id` varchar(36) DEFAULT NULL,
`parent_id` varchar(36) DEFAULT NULL,
`order` int(11) NOT NULL,
`ci_class_id` varchar(30) NOT NULL,
`data` longtext NOT NULL,
`serial` varchar(64) NOT NULL,
`company_id` varchar(50) NOT NULL,
`company_site_id` varchar(50) NOT NULL,
`vendor` varchar(48) NOT NULL,
`type` varchar(64) NOT NULL,
`model` varchar(64) NOT NULL,
`room` varchar(30) NOT NULL,
`rack` varchar(30) NOT NULL,
`rack_slot` varchar(30) NOT NULL,
PRIMARY KEY (`uuid`),
KEY `inventory_cinodeinventory_1fb5ff88` (`root_id`),
KEY `inventory_cinodeinventory_63f17a16` (`parent_id`),
KEY `inventory_cinodeinventory_41ddcf59` (`ci_class_id`),
KEY `inventory_cinodeinventory_14fe63e9` (`company_site_id`),
KEY `inventory_cinodeinventory_543518c6` (`company_id`,`name`),
KEY `name` (`name`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
Run Code Online (Sandbox Code Playgroud)
编辑2:
先前的请求可以通过'FORCE INDEX FOR ORDER BY(name)'来解决.不幸的是,这个提示不适用于我的主题中的初始请求...
编辑3:
我重建了数据库,将'uuid'主键从varchar替换为整数:它根本没有帮助......坏消息.
编辑4:
我试过Mysql 5.5.20:不是更好.对于此特定请求,Postgresql 8.4快10倍.
我修改了一点resquest(删除了T5连接):
SELECT *
FROM `inventory_cinodeinventory`
INNER JOIN `ci_cinodeclass` ON ( `inventory_cinodeinventory`.`ci_class_id` = `ci_cinodeclass`.`class_name` )
INNER JOIN `accounts_companyprofile` ON ( `inventory_cinodeinventory`.`company_id` = `accounts_companyprofile`.`slug` )
INNER JOIN `accounts_companysite` ON ( `inventory_cinodeinventory`.`company_site_id` = `accounts_companysite`.`slug` )
WHERE (
`ci_cinodeclass`.`type` = 'equipment'
AND `inventory_cinodeinventory`.`company_id` = 'thecompany'
)
ORDER BY `inventory_cinodeinventory`.`name` ASC
Run Code Online (Sandbox Code Playgroud)
这工作正常,但我有一些其他请求,只是有点不同,这个技巧不起作用.
事实上,在搜索之后,似乎很快就加入了两个有很多共同点的表,就是说,右表的一半行可以连接到左表的那些(这是我的情况) ):Mysql更喜欢使用表扫描而不是索引:更快我发现某处(!!)
你真正的问题是你第一次解释的第二行:
+----+-------------+---------------------------+--------+----------------------------------------------------------------------------------------------------------+------------------------------------+---------+------------------------------------------------+-------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------------------------+--------+----------------------------------------------------------------------------------------------------------+------------------------------------+---------+------------------------------------------------+-------+---------------------------------+
| 1 | SIMPLE | inventory_cinodeinventory | range | inventory_cinodeinventory_41ddcf59,inventory_cinodeinventory_543518c6,inventory_cinodeinventory_14fe63e9 | inventory_cinodeinventory_543518c6 | 152 | NULL | 42129 | Using where |
Run Code Online (Sandbox Code Playgroud)
您正在使用此WHERE子句分析42129行:
AND `inventory_cinodeinventory`.`company_id` = 'thecompany'
Run Code Online (Sandbox Code Playgroud)
如果您还没有,那么您应该在inventory_cinodeinventory上有一个索引 (company_id, name)
即
ALTER TABLE `inventory_cinodeinventory`
ADD INDEX `inventory_cinodeinventory__company_id__name` (`company_id`, `name`);
Run Code Online (Sandbox Code Playgroud)
这样你的WHERE和ORDER BY子句就不会发生冲突,从而导致索引选择错误,这似乎正在发生.
如果你不已经有这些列的索引,按照这个顺序,我会建议运行OPTIMIZE TABLE inventory_cinodeinventory;,看它是否它得到MySQL使用正确的索引.
一般来说,你有一个更大的问题(我认为是由于Django的设计,但我缺乏使用该框架的经验),因为你有这些巨大的密钥.您的所有密钥EXPLAIN长度分别为152和92字节.这会产生更大的索引,这意味着更多的磁盘访问,这意味着更慢的查询.理想情况下,主键和外键是ints或非常短的varchar列(例如varchar(10)). varchar(50)对于这些密钥,将在数据库响应时间上设置一个显着的常数倍.
正如Conspicuous Compiler指出的那样,我肯定会根据公司ID和名称在您的第一个表上有一个索引(因此名称部分针对order by子句进行了优化).
虽然我也没有对django做过任何事情,但另一个优化MySQL关键字是"STRAIGHT_JOIN",它告诉优化器按照你告诉它的顺序进行查询.例如:
SELECT STRAIGHT_JOIN * FROM ...
Run Code Online (Sandbox Code Playgroud)
在你的"解释"查询的两个实例中,由于某种原因,它仍然停留在companyprofile是一条记录的事实上,并且可能试图使用THAT作为连接的基础并以其他方式处理堆栈.通过执行straight_join,您告诉MySQL您知道主表是"Inventory_CINodeInventory"并首先使用它...其他表更多地是您想要的其他简单元素的"查找"或"引用"表.我已经看到只有这一个关键字采取一个不能完全运行的查询(30小时后杀死任务)违反gov't合约超过1400万条记录的数据到不到2小时...在查询中没有ELSE改变了,就这一个KEYWORD.(但如果还没有这样做,肯定包括其他索引).
每条最新编辑的评论都有疑问......
你提到查询是SLOW的顺序,但没有它的FAST.实际从结果集返回了多少条目.我之前使用的另一种策略是将查询包装为选择以获得答案,然后将命令应用于OUTER结果......类似于
select *
from
( select your Entire Query
from ...
Without The Order by clause
) as FastResults
order by
FastResults.Name
Run Code Online (Sandbox Code Playgroud)
这可能会破坏django自动构建您的SQL语句,但值得尝试进行概念验证.你已经有了一个可以运行的工作语法,我会给你一个机会.