在给定日期删除重复的行组合

Bad*_*dgy 5 sql t-sql sql-server

因此,我有一个大约160 000条目的数据集,它们是计算机生成的,多年来发生了错误。

可以说表有以下几列:

- EntryID (auto int)
- FruitNumber
- JuiceNumber
- CandyNumber
- Date
Run Code Online (Sandbox Code Playgroud)

现在重要的是,FruitNumber, JuiceNumber,CandyNumber当它们之间的时间少于12个月时,的每个组合都是唯一的。

这意味着这些精确组合只能在12个月内存在一次。现在,我需要将此数据集迁移到新的数据模型中,为此,我需要删除重复的记录(但保留其中的1条记录),我尝试使用Queries进行了很多尝试,但找不到解决方案。

Ste*_*pUp 1

尝试使用 cte:

;WITH cte AS 
(
SELECT 
  ft.EntryID
, ft.FruitNumber
, ft.JuiceNumber
, ft.CandyNumber
, ft.Date
, ROW_NUMBER() OVER (PARTITION BY ft.FruitNumber, ft.JuiceNumber, ft.CandyNumber 
     ORDER BY ft.FruitNumber) RN
, DENSE_RANK() OVER (ORDER BY ft.FruitNumber, ft.JuiceNumber, ft.CandyNumber) 
     AS Partitionid
, COUNT(1) OVER (PARTITION BY ft.FruitNumber, ft.JuiceNumber, ft.CandyNumber 
     ORDER BY ft.FruitNumber) as PartitionCNT
FROM FooTable ft
)

SELECT 
t1.* 
, DATEDIFF(DAY, t.Date, t1.Date) DATEDiff
FROM 
cte t 
INNER JOIN cte t1 
    ON t1.FruitNumber = t.FruitNumber
        AND  t1.JuiceNumber = t.JuiceNumber
        AND  t1.CandyNumber = t.CandyNumber
        AND DATEDIFF(DAY, t.Date, t1.Date)>= 365
WHERE t.PartitionCNT > 1
Run Code Online (Sandbox Code Playgroud)

以及样本数据:

CREATE TABLE FooTable
(
    EntryID INT IDENTITY(1, 1) PRIMARY KEY,
    FruitNumber INT,
    JuiceNumber INT,
    CandyNumber INT,
    [Date] DATETIME
);*/


INSERT INTO FooTable
VALUES
(1, 2, 3 , '2019-03-01 00:00:00.000'),
(1, 2, 3 , '2020-03-01 00:00:00.000'),
(4, 5, 6 , '2019-03-01 00:00:00.000'),
(7, 8, 9 , '2019-03-01 00:00:00.000'),
(10, 11, 12 , '2018-03-20 00:00:00.000'),
(13, 14, 15 , '2018-03-20 00:00:00.000'),
(16, 17, 18 , '2017-03-09 00:00:00.000'),
(16, 17, 18 , '2017-02-09 00:00:00.000'),
(22, 23, 34 , '2017-02-12 00:00:00.000'),
(22, 23, 34 , '2017-02-12 00:00:00.000');
Run Code Online (Sandbox Code Playgroud)

和输出:

EntryID FruitNumber JuiceNumber CandyNumber
   2           1           2          3 
Run Code Online (Sandbox Code Playgroud)