如何在三个表(40k行)中优化缓慢的"选择不同"查询,只返回22个结果

Kzq*_*qai 14 mysql sql distinct query-optimization

所以我有其他人写的这个查询,我正在尝试重构,这为项目提供了一些功能/材料(通常是鞋子).

有很多产品,因此有很多连接表条目,但只有少数几个可用的功能.我认为必须有一种方法可以减少触及"大"项目列表的需要,以获得这些功能,我听说要明确避免,但我没有可以替换此处"不同"选项的语句.

根据我的日志,我的结果时间很慢:

Query_time:7 Lock_time:0 Rows_sent:32 Rows_examined:5362862

Query_time:8 Lock_time:0 Rows_sent:22 Rows_examined:6581994

正如消息所说,有时它需要7或8秒,有时或每次查询超过500万行.

这可能是由于同时发生的其他负载,因为这里是直接从mysql命令行在数据库上运行的选择:

mysql> SELECT DISTINCT features.FeatureId, features.Name
       FROM features, itemsfeatures, items
       WHERE items.FlagStatus != 'U'
         AND items.TypeId = '13'
         AND features.Type = 'Material'
         AND features.FeatureId = itemsfeatures.FeatureId
       ORDER BY features.Name;
+-----------+--------------------+
| FeatureId | Name               |
+-----------+--------------------+
|        40 | Alligator          |
|        41 | Burnished Calfskin |
|        42 | Calfskin           |
|        59 | Canvas             |
|        43 | Chromexcel         |
|        44 | Cordovan           |
|        57 | Cotton             |
|        45 | Crocodile          |
|        58 | Deerskin           |
|        61 | Eel                |
|        46 | Italian Leather    |
|        47 | Lizard             |
|        48 | Nappa              |
|        49 | NuBuck             |
|        50 | Ostrich            |
|        51 | Patent Leather     |
|        60 | Rubber             |
|        52 | Sharkskin          |
|        53 | Silk               |
|        54 | Suede              |
|        56 | Veal               |
|        55 | Woven              |
+-----------+--------------------+
22 rows in set (0.00 sec)

mysql> select count(*) from features;
+----------+
| count(*) |
+----------+
|      122 |
+----------+
1 row in set (0.00 sec)

mysql> select count(*) from itemsfeatures;
+----------+
| count(*) |
+----------+
|    38569 |
+----------+
1 row in set (0.00 sec)

mysql> select count(*) from items;
+----------+
| count(*) |
+----------+
|     8656 |
+----------+
1 row in set (0.00 sec)

explain SELECT DISTINCT features.FeatureId, features.Name  FROM features, itemsfeatures, items    WHERE items.FlagStatus != 'U'  AND items.TypeId = '13'  AND features.Type = 'Material' AND features.FeatureId = itemsfeatures.FeatureId  ORDER BY features.Name;
+----+-------------+---------------+------+-------------------+-----------+---------+---------------------------------+------+----------------------------------------------+
| id | select_type | table         | type | possible_keys     | key       | key_len | ref                             | rows | Extra                                        |
+----+-------------+---------------+------+-------------------+-----------+---------+---------------------------------+------+----------------------------------------------+
|  1 | SIMPLE      | features      | ref  | PRIMARY,Type      | Type      | 33      | const                           |   21 | Using where; Using temporary; Using filesort |
|  1 | SIMPLE      | itemsfeatures | ref  | FeatureId         | FeatureId | 4       | sherman_live.features.FeatureId |  324 | Using index; Distinct                        |
|  1 | SIMPLE      | items         | ALL  | TypeId,FlagStatus | NULL      | NULL    | NULL                            | 8656 | Using where; Distinct; Using join buffer     |
+----+-------------+---------------+------+-------------------+-----------+---------+---------------------------------+------+----------------------------------------------+
3 rows in set (0.04 sec)
Run Code Online (Sandbox Code Playgroud)

编辑:

以下是没有distinct的示例结果(但有限制,因为否则它只是挂起)用于比较:

SELECT features.FeatureId, features.Name        FROM features, itemsfeatures, items        WHERE items.FlagStatus != 'U'          AND items.TypeId = '13'          AND features.Type = 'Material'          AND features.FeatureId = itemsfeatures.FeatureId        ORDER BY features.Name limit 10;
+-----------+-----------+
| FeatureId | Name      |
+-----------+-----------+
|        40 | Alligator |
|        40 | Alligator |
|        40 | Alligator |
|        40 | Alligator |
|        40 | Alligator |
|        40 | Alligator |
|        40 | Alligator |
|        40 | Alligator |
|        40 | Alligator |
|        40 | Alligator |
+-----------+-----------+
10 rows in set (23.30 sec)
Run Code Online (Sandbox Code Playgroud)

这里使用的是group而不是select distinct:

SELECT features.FeatureId, features.Name        FROM features, itemsfeatures, items        WHERE items.FlagStatus != 'U'          AND items.TypeId = '13'          AND features.Type = 'Material'          AND features.FeatureId = itemsfeatures.FeatureId        group by features.name ORDER BY features.Name;
+-----------+--------------------+
| FeatureId | Name               |
+-----------+--------------------+
|        40 | Alligator          |
|        41 | Burnished Calfskin |
|        42 | Calfskin           |
|        59 | Canvas             |
|        43 | Chromexcel         |
|        44 | Cordovan           |
|        57 | Cotton             |
|        45 | Crocodile          |
|        58 | Deerskin           |
|        61 | Eel                |
|        46 | Italian Leather    |
|        47 | Lizard             |
|        48 | Nappa              |
|        49 | NuBuck             |
|        50 | Ostrich            |
|        51 | Patent Leather     |
|        60 | Rubber             |
|        52 | Sharkskin          |
|        53 | Silk               |
|        54 | Suede              |
|        56 | Veal               |
|        55 | Woven              |
+-----------+--------------------+
22 rows in set (13.28 sec)
Run Code Online (Sandbox Code Playgroud)

编辑:添加了赏金

...因为我试图理解这个一般性问题,除了这个查询特别容易导致的缓慢之外,如何更换错误选择一般的不同查询.

我想知道选择不同的替换是否通常是一组(虽然在这种情况下,这不是一个全面的解决方案,因为它仍然很慢)?

Joe*_*lli 9

看起来你错过了链接itemsfeatures到的JOIN条件items.如果使用显式JOIN操作编写查询,则更为明显.

SELECT DISTINCT f.FeatureId, f.Name  
    FROM features f
        INNER JOIN itemsfeatures ifx
            ON f.FeatureID = ifx.FeatureID
        INNER JOIN items i
            ON ifx.ItemID = i.ItemID /* This is the part you're missing */
    WHERE i.FlagStatus != 'U'  
        AND i.TypeId = '13'  
        AND f.Type = 'Material' 
    ORDER BY f.Name;
Run Code Online (Sandbox Code Playgroud)

  • Tchalvak:看看*为什么*结果不同.哪些结果"缺失",为什么会丢失?Joe的查询非常有意义,你的查询(充其量)令人困惑. (4认同)
  • @Tchalvak:我们在5月27日对此进行了讨论.此查询不应返回与原始查询相同的结果,因为我添加了我认为缺少连接条件的内容. (2认同)

小智 6

正如乔所说,似乎确实缺少连接条件

这是您当前的查询

SELECT DISTINCT 
        features.FeatureId, 
        features.Name
FROM    features, 
        itemsfeatures, 
        items
WHERE   items.FlagStatus != 'U'
        AND items.TypeId = '13'
        AND features.Type = 'Material'
        AND features.FeatureId = itemsfeatures.FeatureId
ORDER BY features.Name
Run Code Online (Sandbox Code Playgroud)

这是您使用显式连接的查询

SELECT DISTINCT 
        features.FeatureId, 
        features.Name
FROM    features INNER JOIN
        itemsfeatures on features.FeatureId = itemsfeatures.FeatureId CROSS JOIN
        items
WHERE   items.FlagStatus != 'U'
        AND items.TypeId = '13'
        AND features.Type = 'Material'
ORDER BY features.Name
Run Code Online (Sandbox Code Playgroud)

我不能100%确定,但看起来删除对items表的任何引用应该给你完全相同的结果

SELECT DISTINCT 
        features.FeatureId, 
        features.Name
FROM    features, 
        itemsfeatures
WHERE   features.Type = 'Material'
        AND features.FeatureId = itemsfeatures.FeatureId
ORDER BY features.Name
Run Code Online (Sandbox Code Playgroud)

编写查询的方式似乎需要一个typeID为13且Flagstatus <> U的项目的材料列表.如果是这种情况,orignial查询返回的结果是错误的.它只是返回所有物品的所有材料.

因此,Joe表示为项添加内连接并使用显式连接,因为它们使含义更清晰.我更喜欢使用group by,但distinct会做同样的事情.

SELECT  features.FeatureId, 
        features.Name
FROM    features INNER JOIN
        itemsfeatures on features.FeatureId = itemsfeatures.FeatureId INNER JOIN
        items on itemsfeatures.ItemID = items.ItemID
WHERE   items.FlagStatus != 'U'
        AND items.TypeId = '13'
        AND features.Type = 'Material'
GROUP BY features.FeatureId, 
        features.Name
ORDER BY features.Name
Run Code Online (Sandbox Code Playgroud)

随着现在排序,现在速度.创建以下三个索引.

FeaturesIndex(Type,FeatureID,Name)
ItemsFeaturesIndex(FeatureId)
ItemsIndex(TypeId,FlagStatus,ItemID)
Run Code Online (Sandbox Code Playgroud)

这应该加快您当前的查询和我列出的查询.