Ali*_*n I 4 sql sql-server window-functions
如何获得在分区更改时重新启动的RANK?我有这张桌子:
ID Date Value
1 2015-01-01 1
2 2015-01-02 1 <redundant
3 2015-01-03 2
4 2015-01-05 2 <redundant
5 2015-01-06 1
6 2015-01-08 1 <redundant
7 2015-01-09 1 <redundant
8 2015-01-10 2
9 2015-01-11 3
10 2015-01-12 3 <redundant
Run Code Online (Sandbox Code Playgroud)
并且我正在尝试删除上一个条目中未更改“值”的所有行(标记为<冗余)。我已经尝试过使用游标,但是它花费的时间太长,因为该表有约5000万行。
我也尝试使用RANK:
SELECT ID, Date, Value,
RANK() over(partition by Value order by Date ASC) Rank,
FROM DataLogging
ORDER BY Date ASC
Run Code Online (Sandbox Code Playgroud)
但我得到:
ID Date Value Rank (Rank)
1 2015-01-01 1 1 (1)
2 2015-01-02 1 2 (2)
3 2015-01-03 2 1 (1)
4 2015-01-05 2 2 (2)
5 2015-01-06 1 3 (1)
6 2015-01-08 1 4 (2)
7 2015-01-09 1 5 (3)
8 2015-01-10 2 3 (1)
9 2015-01-11 3 1 (1)
10 2015-01-12 3 2 (2)
Run Code Online (Sandbox Code Playgroud)
在括号中是我想要的等级,以便我可以过滤出等级= 1的行并删除其余行。
编辑:我已经接受了似乎最容易编写的答案,但是不幸的是,没有一个答案运行得足够快,无法删除行。最后,我决定还是使用CURSOR。我将数据拆分成大约25万行的块,并且游标在每25万行中大约11分钟内遍历并删除了这些行,下面的答案(对于DELETE)每25万行中大约35分钟。
这是一个有点复杂的方法:
WITH CTE AS
(
SELECT *,
ROW_NUMBER() OVER(ORDER BY [Date]) RN1,
ROW_NUMBER() OVER(PARTITION BY Value ORDER BY [Date]) RN2
FROM dbo.YourTable
), CTE2 AS
(
SELECT *, ROW_NUMBER() OVER(PARTITION BY Value, RN1 - RN2 ORDER BY [Date]) N
FROM CTE
)
SELECT *
FROM CTE2
ORDER BY ID;
Run Code Online (Sandbox Code Playgroud)
结果是:
???????????????????????????????????????????
? ID ? Date ? Value ? RN1 ? RN2 ? N ?
???????????????????????????????????????????
? 1 ? 2015-01-01 ? 1 ? 1 ? 1 ? 1 ?
? 2 ? 2015-01-02 ? 1 ? 2 ? 2 ? 2 ?
? 3 ? 2015-01-03 ? 2 ? 3 ? 1 ? 1 ?
? 4 ? 2015-01-05 ? 2 ? 4 ? 2 ? 2 ?
? 5 ? 2015-01-06 ? 1 ? 5 ? 3 ? 1 ?
? 6 ? 2015-01-08 ? 1 ? 6 ? 4 ? 2 ?
? 7 ? 2015-01-09 ? 1 ? 7 ? 5 ? 3 ?
? 8 ? 2015-01-10 ? 2 ? 8 ? 3 ? 1 ?
? 9 ? 2015-01-11 ? 3 ? 9 ? 1 ? 1 ?
? 10 ? 2015-01-12 ? 3 ? 10 ? 2 ? 2 ?
???????????????????????????????????????????
Run Code Online (Sandbox Code Playgroud)
要删除不需要的行,只需执行以下操作:
DELETE FROM CTE2
WHERE N > 1;
Run Code Online (Sandbox Code Playgroud)
select *
from ( select ID, Date, Value, lag(Value, 1, 0) over (order by ID) as ValueLag
from table ) tt
where ValueLag is null or ValueLag <> Value
Run Code Online (Sandbox Code Playgroud)
如果订单是日期则结束(按日期排序)
这应该向您展示好与坏 - 它基于 ID - 如果您需要日期然后修改
它可能看起来很长,但它应该非常有效
declare @tt table (id tinyint, val tinyint);
insert into @tt values
( 1, 1),
( 2, 1),
( 3, 2),
( 4, 2),
( 5, 1),
( 6, 1),
( 7, 1),
( 8, 2),
( 9, 3),
(10, 3);
select id, val, LAG(val) over (order by id) as lagVal
from @tt;
-- find the good
select id, val
from ( select id, val, LAG(val) over (order by id) as lagVal
from @tt
) tt
where lagVal is null or lagVal <> val
-- select the bad
select tt.id, tt.val
from @tt tt
left join ( select id, val
from ( select id, val, LAG(val) over (order by id) as lagVal
from @tt
) ttt
where ttt.lagVal is null or ttt.lagVal <> ttt.val
) tttt
on tttt.id = tt.id
where tttt.id is null
Run Code Online (Sandbox Code Playgroud)