为什么递归CTE在程序上运行分析函数(ROW_NUMBER)?

Jus*_*ony 12 sql sql-server row-number common-table-expression recursive-cte

我昨天回答了一个递归的CTE,它揭示了这些在SQL Server中实现的方式的问题(也可能在其他RDBMS中?).基本上,当我尝试使用ROW_NUMBER当前的递归级别时,它会针对当前递归级别的每个行子集运行.我希望这可以在真正的SET逻​​辑中工作,并针对整个当前的递归级别运行.

看来,从这篇MSDN文章中,我发现的问题是预期的功能:

CTE的递归部分中的分析和聚合函数应用于当前递归级别的集合,而不应用于CTE的集合.像ROW_NUMBER这样的函数只对当前递归级别传递给它们的数据子集起作用,而不是对整个CTE递归部分的数据集合起作用.有关更多信息,请参阅J.在递归CTE中使用分析函数.

在我的挖掘中,我找不到解释为什么选择这样做的方式呢?这在基于集合的语言中更像是一种过程方法,因此这对我的SQL思维过程起作用,在我看来非常混乱.是否有人知道和/或可以解释为什么递归CTE以递归方式处理递归级别的分析函数?


以下是帮助可视化的代码:

请注意,RowNumber每个代码输出中的列.

这是CTE的SQLFiddle(仅显示递归的第二级)

WITH myCTE
AS
(
  SELECT *, ROW_NUMBER() OVER (ORDER BY Score desc) AS RowNumber, 1 AS RecurseLevel
  FROM tblGroups
  WHERE ParentId IS NULL

  UNION ALL

  SELECT tblGroups.*, 
      ROW_NUMBER() OVER (ORDER BY myCTE.RowNumber , tblGroups.Score desc) AS RowNumber, 
      RecurseLevel + 1 AS RecurseLevel
  FROM tblGroups
      JOIN myCTE
          ON myCTE.GroupID = tblGroups.ParentID
 )
SELECT *
FROM myCTE
WHERE RecurseLevel = 2;
Run Code Online (Sandbox Code Playgroud)

这是我期望CTE做的第二个SQLFiddle(再次只需要第二级来显示问题)

WITH myCTE
AS
(
  SELECT *, ROW_NUMBER() OVER (ORDER BY Score desc) AS RowNumber, 1 AS RecurseLevel
  FROM tblGroups
  WHERE ParentId IS NULL
 )
  SELECT tblGroups.*, 
      ROW_NUMBER() OVER (ORDER BY myCTE.RowNumber , tblGroups.Score desc) AS RowNumber, 
      RecurseLevel + 1 AS RecurseLevel
  FROM tblGroups
      JOIN myCTE
          ON myCTE.GroupID = tblGroups.ParentID;
Run Code Online (Sandbox Code Playgroud)

我一直设想SQL循环CTE 在循环时更像这样运行

DECLARE @RecursionLevel INT
SET @RecursionLevel = 0
SELECT *, ROW_NUMBER() OVER (ORDER BY Score desc) AS RowNumber, @RecursionLevel AS recurseLevel
INTO #RecursiveTable
FROM tblGroups
WHERE ParentId IS NULL

WHILE EXISTS( SELECT tblGroups.* FROM tblGroups JOIN #RecursiveTable ON #RecursiveTable.GroupID = tblGroups.ParentID WHERE recurseLevel = @RecursionLevel)
BEGIN

    INSERT INTO #RecursiveTable
    SELECT tblGroups.*, 
        ROW_NUMBER() OVER (ORDER BY #RecursiveTable.RowNumber , tblGroups.Score desc) AS RowNumber, 
        recurseLevel + 1 AS recurseLevel
    FROM tblGroups
        JOIN #RecursiveTable
            ON #RecursiveTable.GroupID = tblGroups.ParentID
    WHERE recurseLevel = @RecursionLevel
    SET @RecursionLevel = @RecursionLevel + 1
END

SELECT * FROM #RecursiveTable ORDER BY RecurseLevel;
Run Code Online (Sandbox Code Playgroud)

You*_*nes 1

分析函数的特殊之处在于它们需要已知的结果集来解析。它们依赖于以下、之前或完整的结果集来计算当前值。也就是说,在包含分析函数的视图上永远不允许合并视图。为什么?这会改变结果。

前任:

    Select * from (
      select row_number() over (partition by c1 order by c2) rw, c3 from t) z
    where c3=123
Run Code Online (Sandbox Code Playgroud)

不等于

    select row_number() over (partition by c1 order by c2) rw, c3 from t 
    where c3=123
Run Code Online (Sandbox Code Playgroud)

这两个将为 rw 返回不同的值。这就是为什么包含分析函数的子查询总是会在之前完全解析并且永远不会与其余子查询合并。

更新

查看第二个查询:

WITH myCTE
AS
(
  SELECT *, ROW_NUMBER() OVER (ORDER BY Score desc) AS RowNumber, 1 AS RecurseLevel
  FROM tblGroups
  WHERE ParentId IS NULL
 )
  SELECT tblGroups.*, 
      ROW_NUMBER() OVER (ORDER BY myCTE.RowNumber , tblGroups.Score desc) AS RowNumber, 
      RecurseLevel + 1 AS RecurseLevel
  FROM tblGroups
      JOIN myCTE
          ON myCTE.GroupID = tblGroups.ParentID;
Run Code Online (Sandbox Code Playgroud)

它的工作原理与编写的完全一样(相同的执行计划和结果):

SELECT tblGroups.*, 
      ROW_NUMBER() OVER (ORDER BY myCTE.RowNumber , tblGroups.Score desc) AS RowNumber, 
      RecurseLevel + 1 AS RecurseLevel
FROM tblGroups
JOIN (
    SELECT *, ROW_NUMBER() OVER (ORDER BY Score desc) AS RowNumber, 1 AS RecurseLevel
    FROM tblGroups
    WHERE ParentId IS NULL
    )myCTE ON myCTE.GroupID = tblGroups.ParentID;
Run Code Online (Sandbox Code Playgroud)

需要对此进行分区以重置行号。

递归查询不能在 while 循环中工作,它们不是过程性的。从本质上讲,它们的工作方式类似于递归函数,但根据表、查询、索引,它们可以优化为以一种或另一种方式运行。

如果我们确实遵循使用分析函数时视图不能合并的概念,并查看查询 1。它只能运行一次,并且处于嵌套循环中。

WITH myCTE
AS
( /*Cannot be merged*/
  SELECT *, ROW_NUMBER() OVER (ORDER BY Score desc) AS RowNumber, 1 AS RecurseLevel,
  cast(0 as bigint) n
  FROM tblGroups
  WHERE ParentId IS NULL

  UNION ALL

/*Cannot be merged*/
  SELECT tblGroups.*, 
      ROW_NUMBER() OVER (ORDER BY myCTE.RowNumber, tblGroups.Score desc) AS RowNumber,       RecurseLevel + 1 AS RecurseLevel,
  myCTE.RowNumber
  FROM tblGroups
      JOIN myCTE
          ON myCTE.GroupID = tblGroups.ParentID
 )
SELECT *
FROM myCTE;
Run Code Online (Sandbox Code Playgroud)

所以第一个选择,不能合并第二个,也不能。运行此查询的唯一方法是在每个级别中返回的每个项目的嵌套循环中,因此进行重置。再说一次,这不是程序与否的问题,只是可能的执行计划的问题。

希望这能回答您的问题,如果没有,请告诉我:)

y