Bad*_*dgy 5 sql t-sql sql-server
因此,我有一个大约160 000条目的数据集,它们是计算机生成的,多年来发生了错误。
可以说表有以下几列:
- EntryID (auto int)
- FruitNumber
- JuiceNumber
- CandyNumber
- Date
Run Code Online (Sandbox Code Playgroud)
现在重要的是,FruitNumber, JuiceNumber,CandyNumber当它们之间的时间少于12个月时,的每个组合都是唯一的。
这意味着这些精确组合只能在12个月内存在一次。现在,我需要将此数据集迁移到新的数据模型中,为此,我需要删除重复的记录(但保留其中的1条记录),我尝试使用Queries进行了很多尝试,但找不到解决方案。
尝试使用 cte:
;WITH cte AS
(
SELECT
ft.EntryID
, ft.FruitNumber
, ft.JuiceNumber
, ft.CandyNumber
, ft.Date
, ROW_NUMBER() OVER (PARTITION BY ft.FruitNumber, ft.JuiceNumber, ft.CandyNumber
ORDER BY ft.FruitNumber) RN
, DENSE_RANK() OVER (ORDER BY ft.FruitNumber, ft.JuiceNumber, ft.CandyNumber)
AS Partitionid
, COUNT(1) OVER (PARTITION BY ft.FruitNumber, ft.JuiceNumber, ft.CandyNumber
ORDER BY ft.FruitNumber) as PartitionCNT
FROM FooTable ft
)
SELECT
t1.*
, DATEDIFF(DAY, t.Date, t1.Date) DATEDiff
FROM
cte t
INNER JOIN cte t1
ON t1.FruitNumber = t.FruitNumber
AND t1.JuiceNumber = t.JuiceNumber
AND t1.CandyNumber = t.CandyNumber
AND DATEDIFF(DAY, t.Date, t1.Date)>= 365
WHERE t.PartitionCNT > 1
Run Code Online (Sandbox Code Playgroud)
以及样本数据:
CREATE TABLE FooTable
(
EntryID INT IDENTITY(1, 1) PRIMARY KEY,
FruitNumber INT,
JuiceNumber INT,
CandyNumber INT,
[Date] DATETIME
);*/
INSERT INTO FooTable
VALUES
(1, 2, 3 , '2019-03-01 00:00:00.000'),
(1, 2, 3 , '2020-03-01 00:00:00.000'),
(4, 5, 6 , '2019-03-01 00:00:00.000'),
(7, 8, 9 , '2019-03-01 00:00:00.000'),
(10, 11, 12 , '2018-03-20 00:00:00.000'),
(13, 14, 15 , '2018-03-20 00:00:00.000'),
(16, 17, 18 , '2017-03-09 00:00:00.000'),
(16, 17, 18 , '2017-02-09 00:00:00.000'),
(22, 23, 34 , '2017-02-12 00:00:00.000'),
(22, 23, 34 , '2017-02-12 00:00:00.000');
Run Code Online (Sandbox Code Playgroud)
和输出:
EntryID FruitNumber JuiceNumber CandyNumber
2 1 2 3
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
97 次 |
| 最近记录: |