Wel*_*alf 7 sql-server sql-server-2012 greatest-n-per-group
我有一个表,对于一组给定的字段 a、b 和 c,我需要获取按 d 和 e 排序的第一行和最后一行,并且正在使用 ROW_NUMBER 来获取这些行。声明的相关部分是...
ROW_NUMBER() OVER (PARTITION BY a,b,c ORDER BY d ASC, e ASC) AS row_number_start,
ROW_NUMBER() OVER (PARTITION BY a,b,c ORDER BY d DESC, e DESC) AS row_number_end
Run Code Online (Sandbox Code Playgroud)
执行计划显示了两个排序操作,每个操作一个。这些排序操作占语句总成本的 60% 以上(我们在这里谈论的是数千万行,分区通常每个分区有 1-100 条记录,大部分在 10 条以下)
所以如果我能摆脱其中的一个就好了。我试图创建一个索引来复制排序;这消除了排序操作之一,但没有消除后者。(请注意,创建的任何索引仅用于此过程,并且会作为 ETL 过程的一部分每天重新创建。)
从检查执行计划来看,我认为问题是在执行partition by语句时,SQL Server坚持按分区列升序排序。从逻辑上讲,是升序还是降序都没有关系,如果优化器理解这一点,那么它可以向后读取相同的索引来计算 row_number_end。
有什么方法可以让优化器在这里看到意义,或者有人可以建议另一种方法来实现相同的最终目标吗?
CREATE TABLE dbo.Test
(
a integer NOT NULL,
b integer NOT NULL,
c integer NOT NULL,
d integer NOT NULL,
e integer NOT NULL
);
CREATE INDEX i1 ON dbo.Test (a, b, c, d, e);
Run Code Online (Sandbox Code Playgroud)
SELECT
DT.a,
DT.b,
DT.c,
FL.d,
FL.e
FROM
(
-- Could be an indexed view
SELECT
T.a,
T.b,
T.c
FROM dbo.Test AS T
GROUP BY
T.a,
T.b,
T.c
) AS DT
CROSS APPLY
(
(
-- First
SELECT TOP (1)
T2.d,
T2.e
FROM dbo.Test AS T2
WHERE
T2.a = DT.a
AND T2.b = DT.b
AND T2.c = DT.c
ORDER BY
T2.d ASC,
T2.e ASC
)
UNION ALL
(
-- Last
SELECT TOP (1)
T3.d,
T3.e
FROM dbo.Test AS T3
WHERE
T3.a = DT.a
AND T3.b = DT.b
AND T3.c = DT.c
ORDER BY
T3.d DESC,
T3.e DESC
)
) AS FL;
Run Code Online (Sandbox Code Playgroud)
如果平均每组有少量行,则可能会更好:(根据 Martin Smith 的建议改进)
SELECT
TF.a,
TF.b,
TF.c,
TF.d,
TF.e
FROM
(
SELECT
T.*,
rn = ROW_NUMBER() OVER (
PARTITION BY a,b,c
ORDER BY d ASC, e ASC)
FROM dbo.Test AS T
) AS TF
WHERE
TF.rn = 1
UNION ALL
SELECT
TL2.a,
TL2.b,
TL2.c,
TL2.d,
TL2.e
FROM
(
-- TOP (max bigint) to allow an ORDER BY in this scope
SELECT TOP (9223372036854775807)
TL.a,
TL.b,
TL.c,
TL.d,
TL.e
FROM
(
SELECT
T.*,
rn = ROW_NUMBER() OVER (
PARTITION BY a,b,c
ORDER BY d DESC, e DESC)
FROM dbo.Test AS T
) AS TL
WHERE
TL.rn = 1
ORDER BY
-- To allow the optimizer to scan the index backward
-- (Workaround for PARTITION BY being assumed ASC)
TL.a DESC,
TL.b DESC,
TL.c DESC,
TL.d DESC,
TL.e DESC
) AS TL2;
Run Code Online (Sandbox Code Playgroud)
这是基于第一行是第 1 行,最后一行是第 1 行之前的行的想法:
SELECT
L.a,
L.b,
L.c,
L.d,
L.e
FROM
(
-- Add LEAD(1) on numbering
SELECT
N.*,
next_rn = LEAD(N.rn, 1, 1) OVER (
PARTITION BY N.a, N.b, N.c
ORDER BY N.d, N.e)
FROM
(
-- Numbered
SELECT
T.*,
rn = ROW_NUMBER() OVER (
PARTITION BY T.a, T.b, T.c
ORDER BY T.d, T.e)
FROM dbo.Test AS T
) AS N
) AS L
WHERE
-- This row is first, or the next one is
L.rn = 1
OR L.next_rn = 1;
Run Code Online (Sandbox Code Playgroud)
总结评论讨论:
归档时间: |
|
查看次数: |
7405 次 |
最近记录: |