如何进一步优化此 MySQL 查询?

con*_*are 9 mysql performance index optimization

我有一个查询运行时间特别长(15 秒以上),而且随着我的数据集的增长,它只会随着时间的推移而变得更糟。我过去对此进行了优化,并添加了索引、代码级排序和其他优化,但还需要进一步完善。

SELECT sounds.*, avg(ratings.rating) AS avg_rating, count(ratings.rating) AS votes FROM `sounds` 
INNER JOIN ratings ON sounds.id = ratings.rateable_id 
WHERE (ratings.rateable_type = 'Sound' 
   AND sounds.blacklisted = false 
   AND sounds.ready_for_deployment = true 
   AND sounds.deployed = true 
   AND sounds.type = "Sound" 
   AND sounds.created_at > "2011-03-26 21:25:49") 
GROUP BY ratings.rateable_id
Run Code Online (Sandbox Code Playgroud)

该查询的目的是让我获得sound id最近发布的声音的's 和平均评分。大约有 1500 个声音和 200 万个评分。

我有几个索引 sounds

mysql> show index from sounds;
+--------+------------+------------------------------------------+--------------+----------------------+-----------+-------------+----------+--------+------+------------+————+
| Table  | Non_unique | Key_name                                 | Seq_in_index | Column_name          | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+--------+------------+------------------------------------------+--------------+----------------------+-----------+-------------+----------+--------+------+------------+————+
| sounds |          0 | PRIMARY                                  |            1 | id                   | A         |        1388 |     NULL | NULL   |      | BTREE      |         | 
| sounds |          1 | sounds_ready_for_deployment_and_deployed |            1 | deployed             | A         |           5 |     NULL | NULL   | YES  | BTREE      |         | 
| sounds |          1 | sounds_ready_for_deployment_and_deployed |            2 | ready_for_deployment | A         |          12 |     NULL | NULL   | YES  | BTREE      |         | 
| sounds |          1 | sounds_name                              |            1 | name                 | A         |        1388 |     NULL | NULL   |      | BTREE      |         | 
| sounds |          1 | sounds_description                       |            1 | description          | A         |        1388 |      128 | NULL   | YES  | BTREE      |         | 
+--------+------------+------------------------------------------+--------------+----------------------+-----------+-------------+----------+--------+------+------------+---------+
Run Code Online (Sandbox Code Playgroud)

和几个 ratings

mysql> show index from ratings;
+---------+------------+-----------------------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+————+
| Table   | Non_unique | Key_name                                | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+---------+------------+-----------------------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+————+
| ratings |          0 | PRIMARY                                 |            1 | id          | A         |     2008251 |     NULL | NULL   |      | BTREE      |         | 
| ratings |          1 | index_ratings_on_rateable_id_and_rating |            1 | rateable_id | A         |          18 |     NULL | NULL   |      | BTREE      |         | 
| ratings |          1 | index_ratings_on_rateable_id_and_rating |            2 | rating      | A         |        9297 |     NULL | NULL   | YES  | BTREE      |         | 
+---------+------------+-----------------------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
Run Code Online (Sandbox Code Playgroud)

这里是 EXPLAIN

mysql> EXPLAIN SELECT sounds.*, avg(ratings.rating) AS avg_rating, count(ratings.rating) AS votes FROM sounds INNER JOIN ratings ON sounds.id = ratings.rateable_id WHERE (ratings.rateable_type = 'Sound' AND sounds.blacklisted = false AND sounds.ready_for_deployment = true AND sounds.deployed = true AND sounds.type = "Sound" AND sounds.created_at > "2011-03-26 21:25:49") GROUP BY ratings.rateable_id;
+----+-------------+---------+--------+--------------------------------------------------+-----------------------------------------+---------+-----------------------------------------+---------+——————+
| id | select_type | table   | type   | possible_keys                                    | key                                     | key_len | ref                                     | rows    | Extra       |
+----+-------------+---------+--------+--------------------------------------------------+-----------------------------------------+---------+-----------------------------------------+---------+——————+
|  1 | SIMPLE      | ratings | index  | index_ratings_on_rateable_id_and_rating          | index_ratings_on_rateable_id_and_rating | 9       | NULL                                    | 2008306 | Using where | 
|  1 | SIMPLE      | sounds  | eq_ref | PRIMARY,sounds_ready_for_deployment_and_deployed | PRIMARY                                 | 4       | redacted_production.ratings.rateable_id |       1 | Using where | 
+----+-------------+---------+--------+--------------------------------------------------+-----------------------------------------+---------+-----------------------------------------+---------+-------------+
Run Code Online (Sandbox Code Playgroud)

我确实缓存了获得的结果,所以站点性能不是什么大问题,但是由于这个调用需要很长时间,我的缓存预热器运行的时间越来越长,这开始成为一个问题。这似乎不是一个查询需要处理的大量数字……

我还能做些什么来使这个表现更好

Rol*_*DBA 7

在查看了查询、表和 WHERE AND GROUP BY 子句后,我推荐以下内容:

建议 #1) 重构查询

我重新组织了查询来做三(3)件事:

  1. 创建较小的临时表
  2. 处理这些临时表上的 WHERE 子句
  3. 延迟加入到最后

这是我提出的查询:

SELECT
  sounds.*,srkeys.avg_rating,srkeys.votes
FROM
(
  SELECT AA.id,avg(BB.rating) AS avg_rating, count(BB.rating) AS votes
  (
    SELECT id FROM sounds
    WHERE blacklisted = false 
    AND   ready_for_deployment = true 
    AND   deployed = true 
    AND   type = "Sound" 
    AND   created_at > '2011-03-26 21:25:49'
  ) AA INNER JOIN
  (
    SELECT AAA.ratings,AAA.rateable_id
    FROM ratings AAA
    WHERE rateable_type = 'Sound'
  ) BB
  ON AA.id = BB.rateable_id
  GROUP BY BB.rateable_id
) srkeys INNER JOIN sounds USING (id);
Run Code Online (Sandbox Code Playgroud)

建议#2) 用一个索引来索引sounds 表,该索引将容纳WHERE 子句

该索引的列包括来自 WHERE 子句的所有列,首先是静态值,最后是移动目标

ALTER TABLE sounds ADD INDEX support_index
(blacklisted,ready_for_deployment,deployed,type,created_at);
Run Code Online (Sandbox Code Playgroud)

我真诚地相信你会惊喜的。试一试 !!!

更新 2011-05-21 19:04

我刚刚看到了基数。哎哟!!!rateable_id 的基数为 1。男孩,我觉得自己很傻!!!

更新 2011-05-21 19:20

也许制作索引就足以改善事情。

更新 2011-05-21 22:56

请运行这个:

EXPLAIN SELECT
  sounds.*,srkeys.avg_rating,srkeys.votes
FROM
(
  SELECT AA.id,avg(BB.rating) AS avg_rating, count(BB.rating) AS votes FROM
  (
    SELECT id FROM sounds
    WHERE blacklisted = false 
    AND   ready_for_deployment = true 
    AND   deployed = true 
    AND   type = "Sound" 
    AND   created_at > '2011-03-26 21:25:49'
  ) AA INNER JOIN
  (
    SELECT AAA.ratings,AAA.rateable_id
    FROM ratings AAA
    WHERE rateable_type = 'Sound'
  ) BB
  ON AA.id = BB.rateable_id
  GROUP BY BB.rateable_id
) srkeys INNER JOIN sounds USING (id);
Run Code Online (Sandbox Code Playgroud)

更新 2011-05-21 23:34

我再次重构它。请试试这个:

EXPLAIN
  SELECT AA.id,avg(BB.rating) AS avg_rating, count(BB.rating) AS votes FROM
  (
    SELECT id FROM sounds
    WHERE blacklisted = false 
    AND   ready_for_deployment = true 
    AND   deployed = true 
    AND   type = "Sound" 
    AND   created_at > '2011-03-26 21:25:49'
  ) AA INNER JOIN
  (
    SELECT AAA.ratings,AAA.rateable_id
    FROM ratings AAA
    WHERE rateable_type = 'Sound'
  ) BB
  ON AA.id = BB.rateable_id
  GROUP BY BB.rateable_id
;
Run Code Online (Sandbox Code Playgroud)

更新 2011-05-21 23:55

我再次重构它。请试试这个(上次):

EXPLAIN
  SELECT A.id,avg(B.rating) AS avg_rating, count(B.rating) AS votes FROM
  (
    SELECT BB.* FROM
    (
      SELECT id FROM sounds
      WHERE blacklisted = false 
      AND   ready_for_deployment = true 
      AND   deployed = true 
      AND   type = "Sound" 
      AND   created_at > '2011-03-26 21:25:49'
    ) AA INNER JOIN sounds BB USING (id)
  ) A INNER JOIN
  (
    SELECT AAA.ratings,AAA.rateable_id
    FROM ratings AAA
    WHERE rateable_type = 'Sound'
  ) B
  ON A.id = B.rateable_id
  GROUP BY B.rateable_id;
Run Code Online (Sandbox Code Playgroud)

更新 2011-05-22 00:12

我讨厌放弃!!!!

EXPLAIN
  SELECT A.*,avg(B.rating) AS avg_rating, count(B.rating) AS votes FROM
  (
    SELECT BB.* FROM
    (
      SELECT id FROM sounds
      WHERE blacklisted = false 
      AND   ready_for_deployment = true 
      AND   deployed = true 
      AND   type = "Sound" 
      AND   created_at > '2011-03-26 21:25:49'
    ) AA INNER JOIN sounds BB USING (id)
  ) A,
  (
    SELECT AAA.ratings,AAA.rateable_id
    FROM ratings AAA
    WHERE rateable_type = 'Sound'
    AND AAA.rateable_id = A.id
  ) B
  GROUP BY B.rateable_id;
Run Code Online (Sandbox Code Playgroud)

更新 2011-05-22 07:51

令我感到困扰的是,EXPLAIN 中的评级又出现了 200 万行。然后,它击中了我。您可能需要另一个以 rateable_type 开头的 ratings 表索引:

ALTER TABLE ratings ADD INDEX
rateable_type_rateable_id_ndx (rateable_type,rateable_id);
Run Code Online (Sandbox Code Playgroud)

此索引的目标是减少操作评级的临时表,使其少于 200 万。如果我们可以让临时表显着变小(至少一半),那么我们可以对您的查询有更好的希望,我的工作也会更快。

创建该索引后,请重试我最初提出的查询并尝试您的查询:

SELECT
  sounds.*,srkeys.avg_rating,srkeys.votes
FROM
(
  SELECT AA.id,avg(BB.rating) AS avg_rating, count(BB.rating) AS votes
  (
    SELECT id FROM sounds
    WHERE blacklisted = false 
    AND   ready_for_deployment = true 
    AND   deployed = true 
    AND   type = "Sound" 
    AND   created_at > '2011-03-26 21:25:49'
  ) AA INNER JOIN
  (
    SELECT AAA.ratings,AAA.rateable_id
    FROM ratings AAA
    WHERE rateable_type = 'Sound'
  ) BB
  ON AA.id = BB.rateable_id
  GROUP BY BB.rateable_id
) srkeys INNER JOIN sounds USING (id);
Run Code Online (Sandbox Code Playgroud)

更新 2011-05-22 18:39:最后的话

我在存储过程中重构了一个查询并添加了一个索引来帮助回答有关加快速度的问题。我得到了 6 个赞成票,答案被接受了,并获得了 200 的赏金。

我还重构了另一个查询(边缘结果)并添加了一个索引(戏剧性结果)。我得到了 2 个赞成票并接受了答案。

我为另一个查询挑战添加了一个索引,并获得了一次投票

现在你的问题

想要回答所有这样的问题(包括你的)是受到我在重构查询上观看的 YouTube 视频的启发。

再次感谢您,@coneybeare !!!我想尽可能全面地回答这个问题,而不仅仅是接受分数或荣誉。现在,我能感觉到我赚到了积分!!!