多个Row_Number()调用单个SQL查询

Jay*_*yRu 6 sql t-sql sql-server

我正在尝试设置一些数据来计算SQL Server 2008中的多个中位数,但我遇到了性能问题.现在,我正在使用这种模式([另一个例子底部).是的,我没有使用CTE,但使用一个CTE无法解决我遇到的问题而且性能很差,因为row_number子查询以串行方式运行,而不是并行运行.

这是一个完整的例子.在SQL下面我更多地解释了这个问题.

-- build the example table    

CREATE TABLE #TestMedian (
    StateID INT,
    TimeDimID INT,
    ConstructionStatusID INT,

    PopulationSize BIGINT,
    SquareMiles BIGINT
);

INSERT INTO #TestMedian (StateID, TimeDimID, ConstructionStatusID, PopulationSize, SquareMiles)
VALUES (1, 1, 1, 100000, 200000);

INSERT INTO #TestMedian (StateID, TimeDimID, ConstructionStatusID, PopulationSize, SquareMiles)
VALUES (1, 1, 1, 200000, 300000);

INSERT INTO #TestMedian (StateID, TimeDimID, ConstructionStatusID, PopulationSize, SquareMiles)
VALUES (1, 1, 1, 300000, 400000);

INSERT INTO #TestMedian (StateID, TimeDimID, ConstructionStatusID, PopulationSize, SquareMiles)
VALUES (1, 1, 1, 100000, 200000);

INSERT INTO #TestMedian (StateID, TimeDimID, ConstructionStatusID, PopulationSize, SquareMiles)
VALUES (1, 1, 1, 250000, 300000);

INSERT INTO #TestMedian (StateID, TimeDimID, ConstructionStatusID, PopulationSize, SquareMiles)
VALUES (1, 1, 1, 350000, 400000);

--TruNCATE TABLE TestMedian

    SELECT
        StateID
        ,TimeDimID
        ,ConstructionStatusID
        ,NumberOfRows = COUNT(*) OVER (PARTITION BY StateID, TimeDimID, ConstructionStatusID)
        ,PopulationSizeRowNum = ROW_NUMBER() OVER (PARTITION BY StateID, TimeDimID, ConstructionStatusID ORDER BY PopulationSize)
        ,SquareMilesRowNum = ROW_NUMBER() OVER (PARTITION BY StateID, TimeDimID, ConstructionStatusID ORDER BY SquareMiles)
        ,PopulationSize
        ,SquareMiles
    INTO #MedianData
    FROM #TestMedian

    SELECT MinRowNum = MIN(PopulationSizeRowNum), MaxRowNum = MAX(PopulationSizeRowNum), StateID, TimeDimID, ConstructionStatusID, MedianPopulationSize= AVG(PopulationSize) 
    FROM #MedianData T
    WHERE PopulationSizeRowNum IN((NumberOfRows + 1) / 2, (NumberOfRows + 2) / 2)
    GROUP BY StateID, TimeDimID, ConstructionStatusID

    SELECT MinRowNum = MIN(SquareMilesRowNum), MaxRowNum = MAX(SquareMilesRowNum), StateID, TimeDimID, ConstructionStatusID, MedianSquareMiles= AVG(SquareMiles) 
    FROM #MedianData T
    WHERE SquareMilesRowNum IN((NumberOfRows + 1) / 2, (NumberOfRows + 2) / 2)
    GROUP BY StateID, TimeDimID, ConstructionStatusID


    DROP TABLE #MedianData
    DROP TABLE #TestMedian
Run Code Online (Sandbox Code Playgroud)

此查询的问题是SQL Server以串行方式执行"ROW__NUMBER()OVER ..."子查询,而不是并行执行.因此,如果我有这些ROW__NUMBER计算中的10个,它将一个接一个地计算它们并且我得到线性增长,这很臭.我有一个8路32GB系统,我正在运行这个查询,我希望有一些并行性.我正在尝试在5,000,000行表上运行此类查询.

我可以通过查看查询计划并在同一个执行路径中查看Sorts来告诉它这样做(显示查询计划的XML在SO上不能很好地工作).

所以我的问题是:如何更改此查询以便并行执行ROW_NUMBER查询?是否有一种完全不同的技术可用于为多个中值计算准备数据?

Rem*_*anu 3

每个 ROW_NUMBER 都需要首先对行进行排序。由于您的两个 RN 具有不同的 ORDER BY 条件,因此查询必须生成结果,然后为第一个 RN 排序(可能已经排序),生成 RN,然后为第二个 RN 排序并生成第二个 RN 结果。根本没有任何神奇的仙尘可以在不计算行按所需顺序排列的位置的情况下具体化行数值。