ale*_*emb 14 mysql optimization group-by left-join
我正在使用InnoDB.
查询,解释和索引
SELECT
stories.*,
count(comments.id) AS comments,
GROUP_CONCAT(
DISTINCT classifications2.name SEPARATOR ';'
) AS classifications_name,
GROUP_CONCAT(
DISTINCT images.id
ORDER BY images.position,
images.id SEPARATOR ';'
) AS images_id,
GROUP_CONCAT(
DISTINCT images.caption
ORDER BY images.position,
images.id SEPARATOR ';'
) AS images_caption,
GROUP_CONCAT(
DISTINCT images.thumbnail
ORDER BY images.position,
images.id SEPARATOR ';'
) AS images_thumbnail,
GROUP_CONCAT(
DISTINCT images.medium
ORDER BY images.position,
images.id SEPARATOR ';'
) AS images_medium,
GROUP_CONCAT(
DISTINCT images.large
ORDER BY images.position,
images.id SEPARATOR ';'
) AS images_large,
GROUP_CONCAT(
DISTINCT users.id
ORDER BY users.id SEPARATOR ';'
) AS authors_id,
GROUP_CONCAT(
DISTINCT users.display_name
ORDER BY users.id SEPARATOR ';'
) AS authors_display_name,
GROUP_CONCAT(
DISTINCT users.url
ORDER BY users.id SEPARATOR ';'
) AS authors_url
FROM
stories
LEFT JOIN classifications
ON stories.id = classifications.story_id
LEFT JOIN classifications AS classifications2
ON stories.id = classifications2.story_id
LEFT JOIN comments
ON stories.id = comments.story_id
LEFT JOIN image_story
ON stories.id = image_story.story_id
LEFT JOIN images
ON images.id = image_story.`image_id`
LEFT JOIN author_story
ON stories.id = author_story.story_id
LEFT JOIN users
ON users.id = author_story.author_id
WHERE classifications.`name` LIKE 'Home:Top%'
AND stories.status = 1
GROUP BY stories.id
ORDER BY classifications.`name`, classifications.`position`
+----+-------------+------------------+--------+---------------+----------+---------+------------------------+--------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------------+--------+---------------+----------+---------+------------------------+--------+----------------------------------------------+
| 1 | SIMPLE | stories | ref | status | status | 1 | const | 434792 | Using where; Using temporary; Using filesort |
| 1 | SIMPLE | classifications | ref | story_id | story_id | 4 | stories.id | 1 | Using where |
| 1 | SIMPLE | classifications2 | ref | story_id | story_id | 4 | stories.id | 1 | Using where |
| 1 | SIMPLE | comments | ref | story_id | story_id | 8 | stories.id | 6 | Using where; Using index |
| 1 | SIMPLE | image_story | ref | story_id | story_id | 4 | stories.id | 1 | NULL |
| 1 | SIMPLE | images | eq_ref | PRIMARY | PRIMARY | 4 | image_story.image_id | 1 | NULL |
| 1 | SIMPLE | author_story | ref | story_id | story_id | 4 | stories.id | 1 | Using where |
| 1 | SIMPLE | users | eq_ref | PRIMARY | PRIMARY | 4 | author_story.author_id | 1 | Using where |
+----+-------------+------------------+--------+---------------+----------+---------+------------------------+--------+----------------------------------------------+
+-----------------+------------+-------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type |
+-----------------+------------+-------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+
| stories | 0 | PRIMARY | 1 | id | A | 869584 | NULL | NULL | | BTREE |
| stories | 1 | created_at | 1 | created_at | A | 434792 | NULL | NULL | | BTREE |
| stories | 1 | source | 1 | source | A | 2 | NULL | NULL | YES | BTREE |
| stories | 1 | source_id | 1 | source_id | A | 869584 | NULL | NULL | YES | BTREE |
| stories | 1 | type | 1 | type | A | 2 | NULL | NULL | | BTREE |
| stories | 1 | status | 1 | status | A | 2 | NULL | NULL | | BTREE |
| stories | 1 | type_status | 1 | type | A | 2 | NULL | NULL | | BTREE |
| stories | 1 | type_status | 2 | status | A | 2 | NULL | NULL | | BTREE |
| classifications | 0 | PRIMARY | 1 | id | A | 207 | NULL | NULL | | BTREE |
| classifications | 1 | story_id | 1 | story_id | A | 207 | NULL | NULL | | BTREE |
| classifications | 1 | name | 1 | name | A | 103 | NULL | NULL | | BTREE |
| classifications | 1 | name | 2 | position | A | 207 | NULL | NULL | YES | BTREE |
| comments | 0 | PRIMARY | 1 | id | A | 239336 | NULL | NULL | | BTREE |
| comments | 1 | status | 1 | status | A | 2 | NULL | NULL | | BTREE |
| comments | 1 | date | 1 | date | A | 239336 | NULL | NULL | | BTREE |
| comments | 1 | story_id | 1 | story_id | A | 39889 | NULL | NULL | | BTREE |
+-----------------+------------+-------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+
QUERY TIMES
它平均0.035 seconds需要运行.
如果我只GROUP BY删除,0.007平均时间会下降.
如果我只删除stories.status=1过滤器,则时间0.025平均下降.这个似乎很容易优化.
如果我只删除LIKE过滤器和ORDER BY子句,则时间0.006平均下降.
我的理解改善了各种答案.
我加入指数来author_story和images_story它似乎提高查询0.025秒,但由于一些奇怪的原因,EXPLAIN计划看起来好多了.此时删除ORDER BYdrop查询到0.015秒并删除两者ORDER BY并GROUP BY提高查询性能0.006.我现在关注的是这两件事吗?ORDER BY如果需要,我可以进入app逻辑.
这是修订版EXPLAIN和INDEXES
+----+-------------+------------------+--------+---------------------------------+----------+---------+--------------------------+------+--------------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------------+--------+---------------------------------+----------+---------+--------------------------+------+--------------------------------------------------------+
| 1 | SIMPLE | classifications | range | story_id,name | name | 102 | NULL | 14 | Using index condition; Using temporary; Using filesort |
| 1 | SIMPLE | stories | eq_ref | PRIMARY,status | PRIMARY | 4 | classifications.story_id | 1 | Using where |
| 1 | SIMPLE | classifications2 | ref | story_id | story_id | 4 | stories.id | 1 | Using where |
| 1 | SIMPLE | author_story | ref | author_id,story_id,author_story | story_id | 4 | stories.id | 1 | Using index condition |
| 1 | SIMPLE | users | eq_ref | PRIMARY | PRIMARY | 4 | author_story.author_id | 1 | Using where |
| 1 | SIMPLE | comments | ref | story_id | story_id | 8 | stories.id | 8 | Using where; Using index |
| 1 | SIMPLE | image_story | ref | story_id,story_id_2 | story_id | 4 | stories.id | 1 | NULL |
| 1 | SIMPLE | images | eq_ref | PRIMARY,position_id | PRIMARY | 4 | image_story.image_id | 1 | NULL |
+----+-------------+------------------+--------+---------------------------------+----------+---------+--------------------------+------+--------------------------------------------------------+
+-----------------+------------+--------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-----------------+------------+--------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| author_story | 0 | PRIMARY | 1 | id | A | 220116 | NULL | NULL | | BTREE | | |
| author_story | 0 | story_author | 1 | story_id | A | 220116 | NULL | NULL | | BTREE | | |
| author_story | 0 | story_author | 2 | author_id | A | 220116 | NULL | NULL | | BTREE | | |
| author_story | 1 | author_id | 1 | author_id | A | 2179 | NULL | NULL | | BTREE | | |
| author_story | 1 | story_id | 1 | story_id | A | 220116 | NULL | NULL | | BTREE | | |
| image_story | 0 | PRIMARY | 1 | id | A | 148902 | NULL | NULL | | BTREE | | |
| image_story | 0 | story_image | 1 | story_id | A | 148902 | NULL | NULL | | BTREE | | |
| image_story | 0 | story_image | 2 | image_id | A | 148902 | NULL | NULL | | BTREE | | |
| image_story | 1 | story_id | 1 | story_id | A | 148902 | NULL | NULL | | BTREE | | |
| image_story | 1 | image_id | 1 | image_id | A | 148902 | NULL | NULL | | BTREE | | |
| classifications | 0 | PRIMARY | 1 | id | A | 257 | NULL | NULL | | BTREE | | |
| classifications | 1 | story_id | 1 | story_id | A | 257 | NULL | NULL | | BTREE | | |
| classifications | 1 | name | 1 | name | A | 128 | NULL | NULL | | BTREE | | |
| classifications | 1 | name | 2 | position | A | 257 | NULL | NULL | YES | BTREE | | |
| stories | 0 | PRIMARY | 1 | id | A | 962570 | NULL | NULL | | BTREE | | |
| stories | 1 | created_at | 1 | created_at | A | 481285 | NULL | NULL | | BTREE | | |
| stories | 1 | source | 1 | source | A | 4 | NULL | NULL | YES | BTREE | | |
| stories | 1 | source_id | 1 | source_id | A | 962570 | NULL | NULL | YES | BTREE | | |
| stories | 1 | type | 1 | type | A | 2 | NULL | NULL | | BTREE | | |
| stories | 1 | status | 1 | status | A | 4 | NULL | NULL | | BTREE | | |
| stories | 1 | type_status | 1 | type | A | 2 | NULL | NULL | | BTREE | | |
| stories | 1 | type_status | 2 | status | A | 6 | NULL | NULL | | BTREE | | |
| comments | 0 | PRIMARY | 1 | id | A | 232559 | NULL | NULL | | BTREE | | |
| comments | 1 | status | 1 | status | A | 6 | NULL | NULL | | BTREE | | |
| comments | 1 | date | 1 | date | A | 232559 | NULL | NULL | | BTREE | | |
| comments | 1 | story_id | 1 | story_id | A | 29069 | NULL | NULL | | BTREE | | |
| images | 0 | PRIMARY | 1 | id | A | 147206 | NULL | NULL | | BTREE | | |
| images | 0 | source_id | 1 | source_id | A | 147206 | NULL | NULL | YES | BTREE | | |
| images | 1 | position | 1 | position | A | 4 | NULL | NULL | | BTREE | | |
| images | 1 | position_id | 1 | id | A | 147206 | NULL | NULL | | BTREE | | |
| images | 1 | position_id | 2 | position | A | 147206 | NULL | NULL | | BTREE | | |
| users | 0 | PRIMARY | 1 | id | A | 981 | NULL | NULL | | BTREE | | |
| users | 0 | users_email_unique | 1 | email | A | 981 | NULL | NULL | | BTREE | | |
+-----------------+------------+--------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
SELECT
stories.*,
count(comments.id) AS comments,
GROUP_CONCAT(DISTINCT users.id ORDER BY users.id SEPARATOR ';') AS authors_id,
GROUP_CONCAT(DISTINCT users.display_name ORDER BY users.id SEPARATOR ';') AS authors_display_name,
GROUP_CONCAT(DISTINCT users.url ORDER BY users.id SEPARATOR ';') AS authors_url,
GROUP_CONCAT(DISTINCT classifications2.name SEPARATOR ';') AS classifications_name,
GROUP_CONCAT(DISTINCT images.id ORDER BY images.position,images.id SEPARATOR ';') AS images_id,
GROUP_CONCAT(DISTINCT images.caption ORDER BY images.position,images.id SEPARATOR ';') AS images_caption,
GROUP_CONCAT(DISTINCT images.thumbnail ORDER BY images.position,images.id SEPARATOR ';') AS images_thumbnail,
GROUP_CONCAT(DISTINCT images.medium ORDER BY images.position,images.id SEPARATOR ';') AS images_medium,
GROUP_CONCAT(DISTINCT images.large ORDER BY images.position,images.id SEPARATOR ';') AS images_large
FROM
classifications
INNER JOIN stories
ON stories.id = classifications.story_id
LEFT JOIN classifications AS classifications2
ON stories.id = classifications2.story_id
LEFT JOIN comments
ON stories.id = comments.story_id
LEFT JOIN image_story
ON stories.id = image_story.story_id
LEFT JOIN images
ON images.id = image_story.`image_id`
INNER JOIN author_story
ON stories.id = author_story.story_id
INNER JOIN users
ON users.id = author_story.author_id
WHERE classifications.`name` LIKE 'Home:Top%'
AND stories.status = 1
GROUP BY stories.id
ORDER BY NULL
我注意到了另外一件事.如果我没有SELECT stories.content(LONGTEXT)和stories.content_html(LONGTEXT),则查询从0.015秒到0.006秒下降.现在我正在考虑,如果我也离不开谁content和content_html或用别的东西代替它们.
我已经更新了上面的2013-04-13更新中的查询,索引和解释,而不是在此重新发布,因为它们是次要的和增量的.查询仍在使用中filesort.我无法摆脱GROUP BY但已经摆脱了ORDER BY.
根据要求,我从两者中删除了stories_id INDEXES image_story,author_story因为它们是多余的.结果是解释的输出只改变为显示possible_keys改变了.Using Index不幸的是,它仍然没有显示优化.
也改变了LONGTEXT对TEXT,我现在获取LEFT(stories.content, 500),而不是stories.content它正在查询执行时间非常显著差异.
+----+-------------+------------------+--------+-----------------------------+--------------+---------+--------------------------+------+---------------------------------------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+------------------+--------+-----------------------------+--------------+---------+--------------------------+------+---------------------------------------------------------------------+ | 1 | SIMPLE | classifications | ref | story_id,name,name_position | name | 102 | const | 10 | Using index condition; Using where; Using temporary; Using filesort | | 1 | SIMPLE | stories | eq_ref | PRIMARY,status | PRIMARY | 4 | classifications.story_id | 1 | Using where | | 1 | SIMPLE | classifications2 | ref | story_id | story_id | 4 | stories.id | 1 | Using where | | 1 | SIMPLE | author_story | ref | story_author | story_author | 4 | stories.id | 1 | Using where; Using index | | 1 | SIMPLE | users | eq_ref | PRIMARY | PRIMARY | 4 | author_story.author_id | 1 | Using where | | 1 | SIMPLE | comments | ref | story_id | story_id | 8 | stories.id | 8 | Using where; Using index | | 1 | SIMPLE | image_story | ref | story_image | story_image | 4 | stories.id | 1 | Using index | | 1 | SIMPLE | images | eq_ref | PRIMARY,position_id | PRIMARY | 4 | image_story.image_id | 1 | NULL | +----+-------------+------------------+--------+-----------------------------+--------------+---------+--------------------------+------+---------------------------------------------------------------------+ innodb_buffer_pool_size 134217728 TABLE_NAME INDEX_LENGTH image_story 10010624 image_story 4556800 image_story 0 TABLE_NAME INDEX_NAMES SIZE dawn/image_story story_image 13921
我可以立即看到两个优化机会:
将OUTER JOIN更改为INNER JOIN
您的查询目前正在扫描434792个故事,并且您应该能够更好地缩小范围,假设不是每个故事的分类都匹配"Home:Top%".最好使用索引查找您正在查找的分类,然后查找匹配的故事.
但是您正在使用LEFT OUTER JOIN分类,这意味着将扫描所有故事是否具有匹配的分类.然后你通过在WHERE条款中对条件进行条件化来有效地强制要求有一个与你的模式相匹配的分类LIKE.所以它不再是外连接,而是内连接.
如果您首先放置分类表,并使其成为内部联接,优化程序将使用它来缩小对具有匹配分类的故事的搜索范围.
. . .
FROM
classifications
INNER JOIN stories
ON stories.id = classifications.story_id
. . .
Run Code Online (Sandbox Code Playgroud)
优化器应该能够确定何时重新排序表是有利的,因此您可能不必更改查询中的顺序.但是你需要INNER JOIN在这种情况下使用.
添加复合索引
您的交集表image_story和author_story没有复合索引.在多对多关系中将复合索引添加到交集表中通常是一个很大的优势,这样他们就可以执行连接并获得"使用索引"优化.
ALTER TABLE image_story ADD UNIQUE KEY imst_st_im (story_id, image_id);
ALTER TABLE author_story ADD UNIQUE KEY aust_st_au (story_id, author_id);
Run Code Online (Sandbox Code Playgroud)
重新评论和更新:
我不确定你是否正确创建了新索引.您的索引转储不会显示列,并且根据更新的EXPLAIN,新索引未被使用,我希望这会发生.使用新索引将导致EXPLAIN的额外字段中的"使用索引",这应该有助于提高性能.
SHOW CREATE TABLE如您所示,每个表的输出将比索引转储(没有列名称)更完整.
在创建索引后,您可能必须在每个表上运行一次ANALYZE TABLE.此外,多次运行查询,以确保索引位于缓冲池中.这张表是InnoDB还是MyISAM?
我还在EXPLAIN输出中注意到该rows列显示的触摸行数要少得多.这是一个进步.
你真的需要ORDER BY吗?如果您使用,ORDER BY NULL您应该能够摆脱"使用filesort",这可能会提高性能.
重新更新:
您仍然没有从您image_story和author_story表中获得"使用索引"优化.我有一个建议是消除冗余索引:
ALTER TABLE image_story DROP KEY story_id;
ALTER TABLE author_story DROP KEY story_id;
Run Code Online (Sandbox Code Playgroud)
原因是任何可以从story_id上的单列索引中受益的查询也可以从(story_id,image_id)上的两列索引中受益.消除冗余索引有助于优化器做出更好的决策(以及节省一些空间).这是像pt-duplicate-key-checker这样的工具背后的理论.
我还要检查以确保您的缓冲池足够大以容纳索引.您不希望索引在查询期间进出缓冲池.
SHOW VARIABLES LIKE 'innodb_buffer_pool_size'
Run Code Online (Sandbox Code Playgroud)
检查image_story表的索引大小:
SELECT TABLE_NAME, INDEX_LENGTH FROM INFORMATION_SCHEMA.TABLES
WHERE TABLE_NAME = 'image_story';
Run Code Online (Sandbox Code Playgroud)
并将其与当前驻留在缓冲池中的索引数量进行比较:
SELECT TABLE_NAME, GROUP_CONCAT(DISTINCT INDEX_NAME) AS INDEX_NAMES, SUM(DATA_SIZE) AS SIZE
FROM INFORMATION_SCHEMA.INNODB_BUFFER_PAGE_LRU
WHERE TABLE_NAME = '`test`.`image_story`' AND INDEX_NAME <> 'PRIMARY'
Run Code Online (Sandbox Code Playgroud)
当然,将上面的`test`更改为您的表所属的数据库名称.
该information_schema表是MySQL 5.6中的新增功能.我假设您使用的是MySQL 5.6,因为您的EXPLAIN显示"使用索引条件",这在MySQL 5.6中也是新的.
我根本不使用LONGTEXT,除非我真的需要讲故事很长的字符串.记住:
当您使用MYSQL时,您可以利用它 Straight_join
STRAIGHT_JOIN forces the optimizer to join the tables in the order in which they are listed in the FROM clause. You can use this to speed up a query if the optimizer joins the tables in nonoptimal order
另外一个改进范围是过滤表的数据,stories因为您只需要状态为1的数据
因此,在表单子句中,而不是添加整个stories表,只添加所需的记录,因为您的查询计划显示表中有434792行和相同的classification表
FROM
(SELECT
*
FROM
STORIES
WHERE
STORIES.status = 1) stories
LEFT JOIN
(SELECT
*
FROM
classifications
WHERE
classifications.`name` LIKE 'Home:Top%') classifications
ON stories.id = classifications.story_id
Run Code Online (Sandbox Code Playgroud)
还有一个建议,你可以增加,sort_buffer_size因为你显示为order by和group by,但小心增加缓冲区大小,因为缓冲区的大小增加每个会话.
如果可能的话,如果可能的话,您可以在您的应用程序中订购您的记录,因为您已经提到删除该order by条款改进了仅占1/6用原始时间的一部分...
添加索引image_story.image_id的image_story表,并author_story.story_id为author_story表,因为这些列用于连接
还images.position, images.id必须在使用时创建索引.
我想你几乎优化了你的查询,看到你更新...
您可以改进的另一个地方是使用BillKarwin提到的适当的数据类型......您可以使用ENUM或TINYINT键入状态和其他没有任何增长范围的列,它将帮助您优化查询性能以及你的桌子的存储性能....
希望这可以帮助....
计算
GROUP_CONCAT(DISTINCT classifications2.name SEPARATOR ';')
Run Code Online (Sandbox Code Playgroud)
可能是最耗时的操作,因为它classifications是一个大表,并且由于所有连接而使用的行数成倍增加.
所以我建议使用临时表来获取该信息.另外,为避免计算LIKE条件两次(一次用于临时表,一次用于"真实"查询),我还要为此创建一个临时表.
您的原始查询,在一个非常简化的版本(没有图像和用户表,以便更容易阅读)是:
SELECT
stories.*,
count(DISTINCT comments.id) AS comments,
GROUP_CONCAT(DISTINCT classifications2.name ORDER BY 1 SEPARATOR ';' )
AS classifications_name
FROM
stories
LEFT JOIN classifications
ON stories.id = classifications.story_id
LEFT JOIN classifications AS classifications2
ON stories.id = classifications2.story_id
LEFT JOIN comments
ON stories.id = comments.story_id
WHERE
classifications.`name` LIKE 'Home:Top%'
AND stories.status = 1
GROUP BY stories.id
ORDER BY stories.id, classifications.`name`, classifications.`positions`;
Run Code Online (Sandbox Code Playgroud)
我将使用以下查询替换它,使用临时表_tmp_filtered_classifications(名称为LIKE Home:Top%'的分类ID)和_tmp_classifications_of_story(对于每个故事ID'包含' _tmp_filtered_classifications,所有分类名称):
DROP TABLE IF EXISTS `_tmp_filtered_classifications`;
CREATE TEMPORARY TABLE _tmp_filtered_classifications
SELECT id FROM classifications WHERE name LIKE 'Home:Top%';
DROP TABLE IF EXISTS `_tmp_classifications_of_story`;
CREATE TEMPORARY TABLE _tmp_classifications_of_story ENGINE=MEMORY
SELECT stories.id AS story_id, classifications2.name
FROM
_tmp_filtered_classifications
INNER JOIN classifications
ON _tmp_filtered_classifications.id=classifications.id
INNER JOIN stories
ON stories.id = classifications.story_id
LEFT JOIN classifications AS classifications2
ON stories.id = classifications2.story_id
GROUP BY 1,2;
SELECT
stories.*,
count(DISTINCT comments.id) AS comments,
GROUP_CONCAT(DISTINCT classifications2.name ORDER BY 1 SEPARATOR ';')
AS classifications_name
FROM
_tmp_filtered_classifications
INNER JOIN classifications
ON _tmp_filtered_classifications.id=classifications.id
INNER JOIN stories
ON stories.id = classifications.story_id
LEFT JOIN _tmp_classifications_of_story AS classifications2
ON stories.id = classifications2.story_id
LEFT JOIN comments
ON stories.id = comments.story_id
WHERE
stories.status = 1
GROUP BY stories.id
ORDER BY stories.id, classifications.`name`, classifications.`positions`;
Run Code Online (Sandbox Code Playgroud)
请注意,我在查询中添加了一些"order by"子句,以便检查两个查询是否提供相同的结果(使用diff).我也改变了count(comments.id)以count(DISTINCT comments.id)评论的查询计算是错误的,否则数(再次,因为加入该乘行数).