CTE如何真正起作用?

Ele*_*yed 10 sql-server recursive-query common-table-expression

我遇到了这个用于连接行元素的CTE解决方案,我认为它很棒,我意识到CTE有多强大.

但是,为了有效地使用这样的工具,我需要知道它如何在内部工作以构建心理图像,这对于像我这样的初学者来说在不同场景中使用它是必不可少的.

所以我试着慢动作上面代码片段的过程,这里是代码

USE [NORTHWIND]
GO
/****** Object:  Table [dbo].[Products2]  Script Date: 10/18/2011 08:55:07 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
IF OBJECT_ID('Products2','U') IS NOT NULL  DROP TABLE [Products2]
CREATE TABLE [dbo].[Products2](
  [ProductID] [int] IDENTITY(1,1) NOT NULL,
  [ProductName] [nvarchar](40) NOT NULL,
  [SupplierID] [int] NULL,
  [CategoryID] [int] NULL,
  [QuantityPerUnit] [nvarchar](20) NULL,
  [UnitPrice] [money] NULL,
  [UnitsInStock] [smallint] NULL,
  [UnitsOnOrder] [smallint] NULL,
  [ReorderLevel] [smallint] NULL,
  [Discontinued] [bit] NOT NULL
) ON [PRIMARY]
GO
SET IDENTITY_INSERT [dbo].[Products2] ON
INSERT [dbo].[Products2] ([ProductID], [ProductName], [SupplierID], [CategoryID], [QuantityPerUnit], [UnitPrice], [UnitsInStock], [UnitsOnOrder], [ReorderLevel], [Discontinued]) VALUES (1, N'vcbcbvcbvc', 1, 4, N'10 boxes x 20 bags', 18.0000, 39, 0, 10, 0)
INSERT [dbo].[Products2] ([ProductID], [ProductName], [SupplierID], [CategoryID], [QuantityPerUnit], [UnitPrice], [UnitsInStock], [UnitsOnOrder], [ReorderLevel], [Discontinued]) VALUES (2, N'Changassad', 1, 1, N'24 - 12 oz bottles', 19.0000, 17, 40, 25, 0)
INSERT [dbo].[Products2] ([ProductID], [ProductName], [SupplierID], [CategoryID], [QuantityPerUnit], [UnitPrice], [UnitsInStock], [UnitsOnOrder], [ReorderLevel], [Discontinued]) VALUES (3, N'Aniseed Syrup', 1, 2, N'12 - 550 ml bottles', 10.0000, 13, 70, 25, 0)
INSERT [dbo].[Products2] ([ProductID], [ProductName], [SupplierID], [CategoryID], [QuantityPerUnit], [UnitPrice], [UnitsInStock], [UnitsOnOrder], [ReorderLevel], [Discontinued]) VALUES (4, N'Chef Anton''s Cajun Seasoning', 2, 2, N'48 - 6 oz jars', 22.0000, 53, 0, 0, 0)
INSERT [dbo].[Products2] ([ProductID], [ProductName], [SupplierID], [CategoryID], [QuantityPerUnit], [UnitPrice], [UnitsInStock], [UnitsOnOrder], [ReorderLevel], [Discontinued]) VALUES (5, N'Chef Anton''s Gumbo Mix', 10, 2, N'36 boxes', 21.3500, 0, 0, 0, 1)
SET IDENTITY_INSERT [dbo].[Products2] OFF
GO
IF OBJECT_ID('DELAY_EXEC','FN') IS NOT NULL  DROP FUNCTION DELAY_EXEC
GO
CREATE FUNCTION DELAY_EXEC() RETURNS DATETIME
AS
BEGIN
  DECLARE @I INT=0
  WHILE @I<99999
  BEGIN
  SELECT @I+=1
  END
  RETURN GETDATE()
END
GO

WITH CTE (EXEC_TIME, CategoryID, product_list, product_name, length)
     AS (SELECT dbo.DELAY_EXEC(),
                CategoryID,
                CAST('' AS VARCHAR(8000)),
                CAST('' AS VARCHAR(8000)),
                0
         FROM   Northwind..Products2
         GROUP  BY CategoryID
         UNION ALL
         SELECT dbo.DELAY_EXEC(),
                p.CategoryID,
                CAST(product_list + CASE
                                      WHEN length = 0 THEN ''
                                      ELSE ', '
                                    END + ProductName AS VARCHAR(8000)),
                CAST(ProductName AS VARCHAR(8000)),
                length + 1
         FROM   CTE c
                INNER JOIN Northwind..Products2 p
                  ON c.CategoryID = p.CategoryID
         WHERE  p.ProductName > c.product_name)
SELECT *
FROM   CTE
ORDER  BY EXEC_TIME  

--SELECT CategoryId, product_list
--  FROM ( SELECT CategoryId, product_list,
--  RANK() OVER ( PARTITION BY CategoryId ORDER BY length DESC )
--   FROM CTE ) D ( CategoryId, product_list, rank )
--   WHERE rank = 1 ;
Run Code Online (Sandbox Code Playgroud)

注释块是串联问题的理想输出,但这不是问题.

我添加了一个EXEC_TIME列来了解首先添加了哪一行.出于两个原因,输出对我来说不合适

  1. 我认为会有冗余数据,因为p.ProductName > c.product_nameCTE的第一部分在另一个单词中的条件,空行总是小于Product2表中的值,所以每次运行时它应该再次带来一组新添加的行.这有意义吗?

  2. 数据的层次结构真的很奇怪,最后一项应该是最长的,看看最后一项是什么?一个项目length=1

有救援专家吗?提前致谢.

样本结果

EXEC_TIME               CategoryID  product_list                                                        product_name                      length
----------------------- ----------- ------------------------------------------------------------------- --------------------------------- -----------
2011-10-18 12:46:14.930 1                                                                                                                 0
2011-10-18 12:46:14.990 2                                                                                                                 0
2011-10-18 12:46:15.050 4                                                                                                                 0
2011-10-18 12:46:15.107 4           vcbcbvcbvc                                                          vcbcbvcbvc                        1
2011-10-18 12:46:15.167 2           Aniseed Syrup                                                       Aniseed Syrup                     1
2011-10-18 12:46:15.223 2           Chef Anton's Cajun Seasoning                                        Chef Anton's Cajun Seasoning      1
2011-10-18 12:46:15.280 2           Chef Anton's Gumbo Mix                                              Chef Anton's Gumbo Mix            1
2011-10-18 12:46:15.340 2           Chef Anton's Cajun Seasoning, Chef Anton's Gumbo Mix                Chef Anton's Gumbo Mix            2
2011-10-18 12:46:15.400 2           Aniseed Syrup, Chef Anton's Cajun Seasoning                         Chef Anton's Cajun Seasoning      2
2011-10-18 12:46:15.463 2           Aniseed Syrup, Chef Anton's Gumbo Mix                               Chef Anton's Gumbo Mix            2
2011-10-18 12:46:15.520 2           Aniseed Syrup, Chef Anton's Cajun Seasoning, Chef Anton's Gumbo Mi  Chef Anton's Gumbo Mix            3
2011-10-18 12:46:15.580 1           Changassad                                                          Changassad                        1
Run Code Online (Sandbox Code Playgroud)

Mar*_*ith 5

这是一个有趣的问题,帮助我更好地理解递归CTE.

如果查看执行计划,您将看到使用了一个假脱机并且它具有WITH STACK属性集.这意味着以类似堆栈的方式读取行(后进先出)

所以首先锚定部分运行

EXEC_TIME               CategoryID  product_list  
----------------------- ----------- --------------
2011-10-18 12:46:14.930 1                         
2011-10-18 12:46:14.990 2                         
2011-10-18 12:46:15.050 4                
Run Code Online (Sandbox Code Playgroud)

然后4处理,因为这是添加的最后一行.在JOIN被添加到卷轴返回1行则这个新添加的行进行处理.在这种情况下,Join不返回任何内容,因此没有任何额外的东西添加到假脱机,它继续处理CategoryID = 2行.

这将返回3行,这些行将添加到假脱机中

Aniseed Syrup
Chef Anton's Cajun Seasoning
Chef Anton's Gumbo Mix   
Run Code Online (Sandbox Code Playgroud)

然后以类似的LIFO方式依次处理这些行中的每一行,并且在处理可以移动到兄弟行之前首先处理添加的任何子行.希望你能看到这个递归逻辑如何解释你的观察结果,但万一你不能进行C#模拟

using System;
using System.Collections.Generic;
using System.Linq;

namespace Foo
{
    internal class Bar
    {
        private static void Main(string[] args)
        {
            var spool = new Stack<Tuple<int, string, string>>();

            //Add anchor elements
            AddRowToSpool(spool, new Tuple<int, string, string>(1, "", ""));
            AddRowToSpool(spool, new Tuple<int, string, string>(2, "", ""));
            AddRowToSpool(spool, new Tuple<int, string, string>(4, "", ""));

            while (spool.Count > 0)
            {
                Tuple<int, string, string> lastRowAdded = spool.Pop();
                AddChildRows(lastRowAdded, spool);
            }

            Console.ReadLine();
        }

    private static void AddRowToSpool(Stack<Tuple<int, string, string>> spool,
                                      Tuple<int, string, string> row)
        {
            Console.WriteLine("CategoryId={0}, product_list = {1}",
                              row.Item1,
                              row.Item3);
            spool.Push(row);
        }

    private static void AddChildRows(Tuple<int, string, string> lastRowAdded,
                                     Stack<Tuple<int, string, string>> spool)
        {
            int categoryId = lastRowAdded.Item1;
            string productName = lastRowAdded.Item2;
            string productList = lastRowAdded.Item3;

            string[] products;

            switch (categoryId)
            {
                case 1:
                    products = new[] {"Changassad"};
                    break;
                case 2:
                    products = new[]
                                   {
                                       "Aniseed Syrup",
                                       "Chef Anton's Cajun Seasoning",
                                       "Chef Anton's Gumbo Mix "
                                   };
                    break;
                case 4:
                    products = new[] {"vcbcbvcbvc"};
                    break;
                default:
                    products = new string[] {};
                    break;
            }


            foreach (string product in products.Where(
                product => string.Compare(productName, product) < 0))
            {
                string product_list = string.Format("{0}{1}{2}",
                                                 productList,
                                                 productList == "" ? "" : ",",
                                                 product);

                AddRowToSpool(spool,
                              new Tuple<int, string, string>
                                  (categoryId, product, product_list));
            }
        }
    }
}
Run Code Online (Sandbox Code Playgroud)

返回

CategoryId=1, product_list =
CategoryId=2, product_list =
CategoryId=4, product_list =
CategoryId=4, product_list = vcbcbvcbvc
CategoryId=2, product_list = Aniseed Syrup
CategoryId=2, product_list = Chef Anton's Cajun Seasoning
CategoryId=2, product_list = Chef Anton's Gumbo Mix
CategoryId=2, product_list = Chef Anton's Cajun Seasoning,Chef Anton's Gumbo Mix
CategoryId=2, product_list = Aniseed Syrup,Chef Anton's Cajun Seasoning
CategoryId=2, product_list = Aniseed Syrup,Chef Anton's Gumbo Mix
CategoryId=2, product_list = Aniseed Syrup,Chef Anton's Cajun Seasoning,Chef Anton's Gumbo Mix
CategoryId=1, product_list = Changassad
Run Code Online (Sandbox Code Playgroud)


Dam*_*ver 4

使用公共表表达式的递归查询页面描述了 CTE 的逻辑:

递归执行的语义如下:

  1. 将 CTE 表达式拆分为锚点成员和递归成员。

  2. 运行创建第一个调用或基本结果集 (T0) 的锚成员。

  3. 以 Ti 作为输入、Ti+1 作为输出运行递归成员。

  4. 重复步骤 3,直到返回空集。

  5. 返回结果集。这是 T0 到 Tn 的 UNION ALL。

然而,这只是逻辑流程。与往常一样,使用 SQL,如果结果“相同”,服务器可以根据需要自由地重新排序操作,并且认为重新排序可以更有效地提供结果。

GETDATE()在决定是否重新排序操作时,通常不会考虑具有副作用的函数的存在(导致延迟,然后返回)。

Ti+1查询可以重新排序的一种明显方式是,它可能决定在完全创建结果集之前开始处理结果集Ti- 这样做可能比Ti首先完全构造更有效,因为新行肯定已经存在在内存中并且最近被访问过。