为什么Mysql的“哪里存在”比“加入”要慢得多？

Question

为什么Mysql的“哪里存在”比“加入”要慢得多？

我有两个mysql脚本：

select
  p.*
from
    `products` p
    JOIN dispensaries d ON (p.dispensary_id = d.id)
    JOIN dispensary_locations dl ON(dl.dispensary_id = d.id AND dl.is_primary = 1)
    JOIN dispensary_location_zip_codes dlzc ON(dlzc.dispensary_location_id = dl.id AND dlzc.zip_code = '941033')
    and p.`is_hidden` = 0
    and p.`deleted_at` is null
GROUP BY p.id
order by p.`points` desc
limit 12

Run Code Online (Sandbox Code Playgroud)

和

select
  *
from
  `products`
where
  exists (
    select
      *
    from
      `dispensaries`
    where
      `products`.`dispensary_id` = `dispensaries`.`id`
      and exists (
        select
          *
        from
          `dispensary_locations`
        where
          `dispensary_locations`.`dispensary_id` = `dispensaries`.`id`
          and exists (
            select *
            from
              `dispensary_location_zip_codes`
            where
              `dispensary_location_zip_codes`.`dispensary_location_id` = `dispensary_locations`.`id` and `zip_code` = '941033'
          )
          AND is_primary = 1
      )
  )
order by `points` desc
limit 10;

Run Code Online (Sandbox Code Playgroud)

从逻辑上讲，它们应该是相同的，但是在我的数据库上，当zip_code存在时，第一个需要60毫秒，而在不存在时需要30毫秒。第二个在zip_code存在时花费5毫秒，在zip_code不存在时花费9500毫秒，是否知道这里发生了什么？

我的数据库中有大约10000种产品。第一个脚本的时间是60毫秒和30毫秒，第二个脚本的时间是5毫秒和9500毫秒（9.5秒）...

Answer 1

Pra*_*nth 5

为什么Mysql的“哪里存在”比“加入”要慢得多？

并非在所有情况下都较慢。它取决于许多因素，例如每个表的大小，在连接的列上的索引，值的存在（尤其是存在语句）等。

例如，设p和q为每个表中的条目数。

默认情况下Exists会执行嵌套循环，并且一旦发现某些内容就会立即停止执行。因此，最坏情况（如果不存在值）的复杂度为O(p*q)。

DB Engine如何联接表的示意图

嵌套连接

将表中的每个记录与其他中的每个记录进行比较
如果其中一个表非常小，则此连接有效
如果未对连接列建立索引，则非常复杂- O(p*q)
如果索引连接列很复杂- O(p*logq)

哈希联接

准备具有联接属性和行的较小关系的哈希表。扫描更大的关系。
如果表足够小以适合内存
复杂性，符合预期- O(p+q)

合并加入

如果两个表的排序顺序相同。两者都按顺序进行，并在对应的位置进行匹配。
如果两个表在连接列上都有索引，则索引已经按顺序维护它们。因此，复杂度-O(p+q)
如果一个表在连接列上有索引，则在合并步骤发生之前只需要对一个表进行排序。因此，复杂度-O(p+qlogq)
如果两个表的联接列都没有索引，则必须对两个表进行排序，然后才能进行合并步骤。因此，复杂度-O(plogq+qlogq)

在上述情况下，当zip_codes不存在时，数据库引擎必须对O(p*d*dl*dlz)查询2中的所有条目（存在）进行嵌套循环（存在），就像在查询1中一样，数据库引擎应用优化的联接来获取结果。

当存在zip_codes时，在查询2（存在）场景中，它找到匹配的条目，而没有对所有条目进行嵌套循环。

归档时间：	8 年，10 月前
查看次数：	471 次
最近记录：	8 年，10 月前