单个查询运行 10 毫秒,使用 UNION ALL 需要 290 毫秒+(770 万条记录 MySQL DB)。如何优化?

Alf*_*sch 9 mysql

我有一张表,用于存储教师的可用约会,允许两种插入:

  1. 每小时:可以完全自由地为每位教师每天添加无限时段(只要时段不重叠):4 月 15 日,一名教师可能在 10:00、11:00、12:00 和 16:00 拥有时段. 在选择特定的教师时间/时段后为一个人提供服务。

  2. 时间段/范围:4月15日,另一位教师可能在10:00至12:00和14:00至18:00工作。一个人按到达顺序服务,所以如果老师在 10:00 到 12:00 工作,那么在此期间到达的所有人员都将按到达顺序(本地队列)出席。

由于我必须在搜索中返回所有可用的教师,因此我需要将所有空位与到达范围的顺序保存在同一个表中。通过这种方式,我可以按 date_from ASC 排序,在搜索结果中首先显示第一个可用插槽。

当前表结构

CREATE TABLE `teacher_slots` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `teacher_id` mediumint(8) unsigned NOT NULL,
  `city_id` smallint(5) unsigned NOT NULL,
  `subject_id` smallint(5) unsigned NOT NULL,
  `date_from` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
  `date_to` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
  `status` tinyint(4) NOT NULL DEFAULT '0',
  `order_of_arrival` tinyint(1) unsigned NOT NULL DEFAULT '0',
  PRIMARY KEY (`id`),
  KEY `by_hour_idx` (`teacher_id`,`order_of_arrival`,`status`,`city_id`,`subject_id`,`date_from`),
  KEY `order_arrival_idx` (`order_of_arrival`,`status`,`city_id`,`subject_id`,`date_from`,`date_to`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
Run Code Online (Sandbox Code Playgroud)

搜索查询

我需要按以下条件过滤:实际日期时间、city_id、subject_id 以及插槽是否可用(状态 = 0)。

对于按小时计费,我必须为每个老师显示第一个最近的可用时段的所有可用时段(显示给定日期的所有时段,并且不能为同一位老师显示超过一天的时段)。(我在mattedgod的帮助下得到了查询)。

对于基于范围(order_of_arrival = 1),我必须显示最接近的可用范围,每个老师只显示一次。

第一个查询在 0.10 毫秒左右单独运行,第二个查询 0.08 毫秒和 UNION ALL 平均为 300 毫秒。

(
    SELECT id, teacher_slots.teacher_id, date_from, date_to, order_of_arrival
    FROM teacher_slots
    JOIN (
        SELECT DATE(MIN(date_from)) as closestDay, teacher_id
        FROM teacher_slots
        WHERE   date_from >= '2014-04-10 08:00:00' AND order_of_arrival = 0
                AND status = 0 AND city_id = 6015 AND subject_id = 1
        GROUP BY teacher_id
    ) a ON a.teacher_id = teacher_slots.teacher_id
    AND DATE(teacher_slots.date_from) = closestDay
    WHERE teacher_slots.date_from >= '2014-04-10 08:00:00'
        AND teacher_slots.order_of_arrival = 0
        AND teacher_slots.status = 0
        AND teacher_slots.city_id = 6015
        AND teacher_slots.subject_id = 1
)

UNION ALL

(
    SELECT id, teacher_id, date_from, date_to, order_of_arrival
    FROM teacher_slots
    WHERE order_of_arrival = 1 AND status = 0 AND city_id = 6015 AND subject_id = 1
        AND (
            (date_from <= '2014-04-10 08:00:00' AND  date_to >= '2014-04-10 08:00:00')
            OR (date_from >= '2014-04-10 08:00:00')
        )
    GROUP BY teacher_id
)

ORDER BY date_from ASC;
Run Code Online (Sandbox Code Playgroud)

有没有办法优化 UNION,这样我就可以在一个查询(使用 IF 等)中获得最大 ~20ms 的合理响应,甚至基于 + 每小时的返回范围?

SQL小提琴: http ://www.sqlfiddle.com/#!2/59420/1/0

编辑:

我通过创建一个字段“only_date_from”尝试了一些非规范化,我只存储了日期,所以我可以改变这个......

DATE(MIN(date_from)) as closestDay / DATE(teacher_slots.date_from) = closestDay
Run Code Online (Sandbox Code Playgroud)

...到这个

MIN(only_date_from) as closestDay / teacher_slots.only_date_from = closestDay
Run Code Online (Sandbox Code Playgroud)

它已经为我节省了 100 毫秒!平均仍为 200 毫秒。

Sep*_*ter 1

首先,我认为你原来的查询可能不“正确”;参考你的 SQLFiddle,在我看来,你应该返回带有ID=的行23并且4(除了从这半部分得到的带有 = 的行之外),因为你现有的逻辑看起来好像你打算返回ID这些其他行被包含在内,因为它们明确满足第二个条款的部分。1OR (date_from >= '2014-04-10 08:00:00')WHERE

GROUP BY teacher_id您的第二部分中的子句导致UNION您丢失这些行。这是因为您实际上并未聚合选择列表中的任何列,在这种情况下,这GROUP BY将导致“难以定义”行为。

另外,虽然我无法解释您的性能不佳UNION,但我可以通过直接从查询中删除它来解决它:

我没有使用两组独立的(部分重复的)逻辑来从同一个表中获取行,而是将您的逻辑合并到一个查询中,并将逻辑中的差异合并OR在一起 - 即,如果一行满足一个或另一个您原来的WHERE条款,它已包含在内。这是可能的,因为我已经(INNER) JOIN用.closestDateLEFT JOIN

LEFT JOIN意味着我们现在还能够区分哪一组逻辑应该应用于一行;如果连接有效(closestDate IS NOT NULL),我们将应用您前半部分的逻辑,但如果连接失败(closestDate IS NULL),我们将应用您后半部分的逻辑。

因此,这将返回查询返回的所有行(在小提琴中),并且它还会拾取那些额外的行。

  SELECT
    *

  FROM 
    teacher_slots ts

    LEFT JOIN 
    (
      SELECT 
        teacher_id,
        DATE(MIN(date_from)) as closestDay

      FROM 
        teacher_slots

      WHERE   
        date_from >= '2014-04-10 08:00:00' 
        AND order_of_arrival = 0
        AND status = 0 
        AND city_id = 6015 
        AND subject_id = 1

      GROUP BY 
        teacher_id

    ) a
    ON a.teacher_id = ts.teacher_id
    AND a.closestDay = DATE(ts.date_from)

  WHERE 
    /* conditions that were common to both halves of the union */
    ts.status = 0
    AND ts.city_id = 6015
    AND ts.subject_id = 1

    AND
    (
      (
        /* conditions that were from above the union 
           (ie when we joined to get closest future date) */
        a.teacher_id IS NOT NULL
        AND ts.date_from >= '2014-04-10 08:00:00'
        AND ts.order_of_arrival = 0
      ) 
      OR
      (
        /* conditions that were below the union 
          (ie when we didn't join) */
        a.teacher_id IS NULL       
        AND ts.order_of_arrival = 1 
        AND 
        (
          (
            date_from <= '2014-04-10 08:00:00' 
            AND  
            date_to >= '2014-04-10 08:00:00'
          )

          /* rows that met this condition were being discarded 
             as a result of 'difficult to define' GROUP BY behaviour. */
          OR date_from >= '2014-04-10 08:00:00' 
        )
      )
    )

  ORDER BY 
   ts.date_from ASC;
Run Code Online (Sandbox Code Playgroud)

此外,您可以进一步“整理”您的查询,这样您就不需要多次“插入”您的status,city_id和参数。subject_id

为此,请更改子查询a以同时选择这些列,并对这些列进行分组。然后,JOINsON子句需要将这些列映射到它们的ts.xxx等价列。

我认为这不会对性能产生负面影响,但如果没有在大型数据集上进行测试就无法确定。

所以你的加入看起来更像是:

LEFT JOIN 
(
  SELECT 
    teacher_id,
    status,
    city_id,
    subject_id,
    DATE(MIN(date_from)) as closestDay

  FROM 
    teacher_slots

  WHERE   
    date_from >= '2014-04-10 08:00:00' 
    AND order_of_arrival = 0
  /* These no longer required here...
    AND status = 0 
    AND city_id = 6015 
    AND subject_id = 1
  */

  GROUP BY 
    teacher_id,
    status,
    city_id,
    subject_id

) a
ON a.teacher_id = ts.teacher_id
AND a.status = ts.status 
AND a.city_id = ts.city_id 
AND a.subject_id = ts.city_id
AND a.closestDay = DATE(ts.date_from)
Run Code Online (Sandbox Code Playgroud)