简化涉及 UNION ALL、WHERE 和 LIMIT 的查询的“重复”代码

1 mysql sql-server union

如何简化下面显示的查询的代码?要求始终与我描述的相同,如下所示:

(select * from questions where points = 3 and type = 1 order by rand() asc limit 6)
union all
(select * from questions where points = 2 and type = 1 order by rand() asc limit 4)
union all
(select * from questions where points = 1 and type = 1 order by rand() asc limit 2)
union all
(select * from questions where points = 3 and type = 0 order by rand() asc limit 10)
union all
(select * from questions where points = 2 and type = 0 order by rand() asc limit 6)
union all
(select * from questions where points = 1 and type = 0 order by rand() asc limit 4)
Run Code Online (Sandbox Code Playgroud)

有没有办法做到这一点UNION

我只是好奇是否有更快/更有效的方法。SQL Server 还是 MySQL,我还没有决定,因为我现在处于设计阶段。

And*_*y M 5

您可以使用ROW_NUMBER()分析函数为 的每个分区生成行号(points, type)

SELECT
  *,
  ROW_NUMBER() OVER (PARTITION BY points, type ORDER BY rand ()) AS rownum
FROM
  questions
Run Code Online (Sandbox Code Playgroud)

然后您可以与rownum取决于points和的某个值进行比较type

WHERE
  rownum <= some_expression
Run Code Online (Sandbox Code Playgroud)

当然,您需要使用嵌套才能rownum像这样引用:

SELECT
  ...
FROM
  (
    SELECT
      ...
      ROW_NUMBER() ... AS rownum
    FROM
      ...
  ) AS derived
WHERE
  rownum <= ...
Run Code Online (Sandbox Code Playgroud)

并且您需要记住,由于嵌套,该rownum列成为派生表列集的一部分——因此,指定*为主 SELECT 的列列表也将包括该rownum列。如果您想将其排除在结果之外,则必须questions明确列出每一列:

SELECT
  question_id,  /* or whatever the PK column is going to be called */
  points,
  type,
  ...  /* other "questions" columns */
FROM
  (
    ...
Run Code Online (Sandbox Code Playgroud)

考虑到上述所有要点(特别是最后一点),我不确定结果查询是否可以称为原始查询的简化,但无论如何这是我的尝试:

SELECT
  question_id,
  points,
  type,
  ...
FROM
  (
    SELECT
      *,
      ROW_NUMBER() OVER (PARTITION BY points, type ORDER BY rand ()) AS rownum
    FROM
      questions
  ) AS derived
WHERE
  rownum <= CASE
              WHEN points = 3 AND type = 1 THEN 6
              WHEN points = 2 AND type = 1 THEN 4
              WHEN points = 3 AND type = 1 THEN 2
              WHEN points = 3 AND type = 0 THEN 10
              WHEN points = 2 AND type = 0 THEN 6
              WHEN points = 1 AND type = 0 THEN 4
            END
;
Run Code Online (Sandbox Code Playgroud)

CASE 表达式为每个必需的分区返回一个行限制,为任何其他分区返回一个 NULL,从而有效地提供了一个过滤器来消除所有不相关的行(其中pointstype具有除条件中提到的值之外的值的行)。但是,这种过滤器不会下推到基础questions表,因此不会阻止查询在过滤掉不需要的那些组合之前为所有组合生成行号(points, type)。您可以通过向questions表中添加显式过滤器来避免冗余工作,如下所示:

  ...
FROM
  (
    SELECT
      *,
      ROW_NUMBER() OVER (PARTITION BY points, type ORDER BY rand ()) AS rownum
    FROM
      questions
    WHERE
      points IN (1, 2, 3)
      AND type IN (0, 1)
  ) AS derived
WHERE
  ...
Run Code Online (Sandbox Code Playgroud)

选择

或者,您可以以不同的方式重写整个查询,以便同时过滤行并分配行限制。如果您使用这样的虚拟表:

(
  SELECT 3, 1,  6 UNION ALL
  SELECT 2, 1,  4 UNION ALL
  SELECT 1, 1,  2 UNION ALL
  SELECT 3, 0, 10 UNION ALL
  SELECT 2, 0,  6 UNION ALL
  SELECT 1, 0,  4
) AS limits (points, type, rowlimit)
Run Code Online (Sandbox Code Playgroud)

您可以加入questions它,从而拥有一个过滤器和一种为每个分区分配行限制的方法。完整的查询将如下所示:

SELECT
  question_id,
  points,
  type,
  ...
FROM
  (
    SELECT
      *,
      ROW_NUMBER() OVER (PARTITION BY points, type ORDER BY rand ()) AS rownum
    FROM
      questions AS q
      INNER JOIN
      (
        SELECT 3, 1,  6 UNION ALL
        SELECT 2, 1,  4 UNION ALL
        SELECT 1, 1,  2 UNION ALL
        SELECT 3, 0, 10 UNION ALL
        SELECT 2, 0,  6 UNION ALL
        SELECT 1, 0,  4
      ) AS limits (points, type, rowlimit)
        USING (points, type)
  ) AS derived
WHERE
  rownum <= rowlimit
;
Run Code Online (Sandbox Code Playgroud)

然后,您可以更进一步,使limits数据集成为数据库中的实际表。您已经提到需求不会改变,但也许知道您可以配置表中的需求会改变这种观点。

产品相关说明

您提到您尚未决定将在哪个 SQL 产品中使用它。以下是一些需要考虑的与产品相关的注意事项。

  1. 该解决方案的两种变体都适用于 MySQL 8.0 或更新版本。版本 8.0 是最低版本,因为这是ROW_NUMBER()MySQL 中首次引入的支持。

  2. 您提到的另一个产品 SQL ServerROW_NUMBER()自 2005 版起就得到支持,但解决方案的其他方面需要更改才能使其在 SQL Server 中工作。

    2.1. 您正在使用ORDER BY RAND()随机选择的行。在 SQL Server 中,该RAND()函数是一个运行时常量,因此您需要使用不同的方法;ORDER BY NEWID()可能是一个:

      ROW_NUMBER() OVER (PARTITION BY points, type ORDER BY NEWID()) AS rownum
    Run Code Online (Sandbox Code Playgroud)

    2.2. JOIN ... USINGSQL Server 不支持该语法。您需要将其替换为JOIN ... ON一个。由于此开关,您还需要重写嵌套的 SELECT 列列表。原因是,当您在 SELECT 子句中使用时,会USING自动抑制用于连接points和的同名列的重复。该运营商就没有这样的效果,因此,你需要重写这样的派生查询:type*ON

      (
        SELECT
          q.*,
          limits.rowlimit,
          ROW_NUMBER() OVER (PARTITION BY points, type ORDER BY NEWID()) AS rownum
        FROM
          questions AS q
          INNER JOIN ( ... ) AS limits (points, type, rowlimit)
            ON q.points = limits.points AND q.type = limits.type
      ) AS derived
    
    Run Code Online (Sandbox Code Playgroud)

    2.3. 虽然本身不​​是必需的更改,但您可以利用 SQL Server 对VALUES构造函数的支持作为定义派生表的一种方式,使limits定义更加紧凑:

      (
        VALUES
          (3, 1,  6),
          (2, 1,  4),
          (1, 1,  2),
          (3, 0, 10),
          (2, 0,  6),
          (1, 0,  4)
      ) AS limits (points, type, rowlimit)
    
    Run Code Online (Sandbox Code Playgroud)

函数的结果在 SELECT 语句的执行期间不会改变。