And*_*ker 5 t-sql sql-server sql-server-2005
我有一个包含大量数据的表,我们特别关心这个date领域.原因是数据量上升了大约30倍,旧的方式很快就会崩溃.我希望您可以帮助我优化需求的查询:
例如,当前表包含5秒(+/-一点)间隔的数据.我需要对该表进行采样并获得最接近30秒间隔的记录.
我现在所做的工作得很好.我只是好奇是否有办法更优化它.如果我能在Linq To SQL中做到这一点,那也是很好的.考虑到日期值的数量(约200万行最小值),我甚至对索引的建议感兴趣.
declare @st datetime ; set @st = '2012-01-31 05:05:00';
declare @end datetime ; set @end = '2012-01-31 05:10:00';
select distinct
log.* -- id,
from
dbo.fn_GenerateDateSteps(@st, @end, 30) as d
inner join lotsOfLogData log on l.Id = (
select top 1 e.[Id]
from
lotsOfLogData as log -- contains data in 5 second intervals
where
log.stationId = 1000
-- search for dates in a certain range
AND utcTime between DateAdd(s, -10, dt) AND DateAdd(s, 5, dt)
order by
-- get the 'closest'. this can change a little, but will always
-- be based on a difference between the date
abs(datediff(s, dt, UtcTime))
)
-- updated the query to be correct. stadionId should be inside the subquery
Run Code Online (Sandbox Code Playgroud)
lotsOfLogData的表结构如下.站点ID(可能是50个)相对较少,但每个站点都有很多记录.我们查询时知道了站号.
create table ##lotsOfLogData (
Id bigint identity(1,1) not null
, StationId int not null
, UtcTime datetime not null
-- 20 other fields, used for other calculations
)
Run Code Online (Sandbox Code Playgroud)
对于给定的参数,fn_GenerateDateSteps返回这样的数据集:
[DT]
2012-01-31 05:05:00.000
2012-01-31 05:05:30.000
2012-01-31 05:06:00.000
2012-01-31 05:06:30.000 (and so on, every 30 seconds)
Run Code Online (Sandbox Code Playgroud)
我也用这样的方式用临时表做了这个,但是出来的只是稍贵一点.
declare @dates table ( dt datetime, ClosestId bigint);
insert into @dates (dt) select dt from dbo.fn_GenerateDateSteps(@st, @end, 30)
update @dates set closestId = ( -- same subquery as above )
select * from lotsOfLogData inner join @dates on Id = ClosestId
Run Code Online (Sandbox Code Playgroud)
编辑:修正了
现在有200K +行可以使用.我尝试了两种方式,交叉应用适当的索引(id/time + includes(..所有列......)工作正常.但是,我最终得到了我开始的查询,使用更简单(和现有)关于[id + time]的索引.更容易理解的查询是我为什么选择那个.也许还有更好的方法来做,但我看不到它:D
-- subtree cost (crossapply) : .0808
-- subtree cost (id based) : .0797
-- see above query for what i ended up with
Run Code Online (Sandbox Code Playgroud)
你可以尝试
inner join为cross apply.where log.stationid选择。SQL语句
SELECT DISTINCT log.* -- id,
FROM dbo.fn_GenerateDateSteps(@st, @end, 30) AS d
CROSS APPLY (
SELECT TOP 1 log.*
FROM lotsOfLogData AS log -- contains data in 5 second intervals
WHERE -- search for dates in a certain range
utcTime between DATEADD(s, -10, d.dt) AND DATEADD(s, 5, d.dt)
AND log.stationid = 1000
ORDER BY
-- get the 'closest'. this can change a little, but will always
-- be based on a difference between the date
ABS(DATEDIFF(s, d.dt, UtcTime))
) log
Run Code Online (Sandbox Code Playgroud)