mk8*_*efz 3 sql sql-server missing-data gaps-and-islands sql-server-2016
我收到来自多个天线的每日txt数据文件.文件的命名约定是:
独特的天线ID +年+月+日+随机3位数
我解析了文件名并创建了一个像这样的表:
AntennaID fileyear filemonth fileday filenumber filename
0000 2016 09 22 459 000020160922459.txt
0000 2016 09 21 981 000020160921981.txt
0000 2016 09 20 762 000020160920762.txt
0001 2016 09 22 635 000120160922635.txt
.
.
.
etc. (200k rows)
Run Code Online (Sandbox Code Playgroud)
有时天线发送的文件多于1个或根本没有文件.如果发送的文件超过1,则唯一的3位数文件编号会区分文件,但我正在尝试查找未发送文件的日期.
我已经尝试了几个groupby语句来比较给定月份的数据文件数量,看看它是否与那个月的天数相匹配 - 但问题是有时候天线每天发送超过1个文件,这可能是人为的如果我们只是比较计数,则弥补"遗失"文件.
我正在寻找一种更健壮的方法来查找丢失文件的日期或日期范围.我已经查看了Partition和Over函数,感觉可能存在潜力,但我不确定如何使用它们,因为我对SQL很新.
我正在使用Microsoft SQL Server 2016
您可以使用公用表表达式(或cte简称)来创建日期表.然后join,您可以从此表到天线数据并查找返回null值的日期:
declare @MinDate date = getdate()-50
declare @MaxDate date = getdate()
;with Dates as
(
select @MinDate as DateValue
union all
select dateadd(d,1,DateValue)
from Dates
where DateValue < @MaxDate
)
select d.DateValue
from Dates d
left join AntennaData a
on(d.DateValue = cast(cast(a.fileyear as nvarchar(4)) + cast(a.filemonth as nvarchar(4)) + cast(a.fileday as nvarchar(4)) as date))
option (maxrecursion 0)
Run Code Online (Sandbox Code Playgroud)
虽然递归CTE将生成日期列表,但这不是最有效的方法.如果速度对您很重要,请使用基于集合的计数表:
declare @MinDate date = getdate()-50;
declare @MaxDate date = getdate();
-- Generate table with 10 rows
with t(t) as (select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1)
-- Add row numbers (-1 to start at adding 0 to retain @MinDate value) based on tally table to @MinDate for the number of days +1 (to ensure Min and Max date are included) between the two dates
,d(d) as (select top(datediff(day, @MinDate, @MaxDate)+1) dateadd(day,row_number() over (order by (select null))-1,@MinDate)
from t t1,t t2,t t3,t t4,t t5,t t6 -- Cross join creates 10^6 or 10*10*10*10*10*10 = 1,000,000 row table
)
select *
from d;
Run Code Online (Sandbox Code Playgroud)