web*_*nia 2 sql sql-server logic
我有一张这样的桌子
ColumnId Intime Outtime
1 01/02/2009 10.00.000 01/02/2009 20.00.0000
2 01/02/2009 2.00.000 01/02/2009 2.00.0000
3 01/02/2009 2.00.000 01/02/2009 5.00.0000
4 01/02/2009 3.3.0.000 01/02/2009 5.00.0000
5 01/02/2009 10.00.000 01/02/2009 22.00.0000
6 01/02/2009 3.00.000 01/02/2009 4.00.0000
Run Code Online (Sandbox Code Playgroud)
我有这样的列和值.我想找到重叠的记录以及特定日期的重叠记录数.从一天1-24开始重叠.
注意: - 我的表有数百万条记录.
例如,在第一个值中登录10并注销20.在5中记录登录在10并且在22记录,因此第5个与第一个重叠.表中没有指数.
请告诉我查询的答案.
我需要在SQL Server 2005中执行查询
出于我的想法,并假设两列都有索引,你可以使用这样的东西:
SELECT a.ColumnId
,a.InTime
,a.OutTime
,b.ColumnId AS OverlappingId
,b.InTime AS OverlappingInTime
,b.OutTime AS OverlappingOutTime
FROM TimeTable AS a
JOIN TimeTable AS b ON ((a.InTime BETWEEN b.InTime AND b.OutTime)
OR (a.OutTime BETWEEN b.InTime AND b.OutTime)
OR (a.InTime < b.InTime AND a.OutIme > b.OutTime))
AND (a.ColumnId != b.ColumnId)
Run Code Online (Sandbox Code Playgroud)
但是我真的不确定这个查询在你提到的包含数百万条记录的表格中的表现.
编辑添加,再次编辑:
在Vadim K.的评论之后,我注意到我之前写过的查询错过了重叠是完全的情况,即一个范围覆盖了另一个范围.以上是我修改后的查询,低于原始查询:
SELECT a.ColumnId
,a.InTime
,a.OutTime
,b.ColumnId AS OverlappingId
,b.InTime AS OverlappingInTime
,b.OutTime AS OverlappingOutTime
FROM TimeTable AS a
JOIN TimeTable AS b ON ((a.InTime BETWEEN b.InTime AND b.OutTime)
OR (a.OutTime BETWEEN b.InTime AND b.OutTime))
AND (a.ColumnId != b.ColumnId)
Run Code Online (Sandbox Code Playgroud)
使用问题初始数据进行测试运行:
+--------+------------------+------------------+
|ColumnId| InTime | OutTime |
+--------+------------------+------------------+
| 1 | 01/02/2009 10:00 | 01/02/2009 20:00 |
| 2 | 01/02/2009 2:00 | 01/02/2009 2:00 |
| 3 | 01/02/2009 2:00 | 01/02/2009 5:00 |
| 4 | 01/02/2009 3:03 | 01/02/2009 5:00 |
| 5 | 01/02/2009 10:00 | 01/02/2009 22:00 |
| 6 | 01/02/2009 3:00 | 01/02/2009 4:00 |
+--------+------------------+------------------+
Run Code Online (Sandbox Code Playgroud)
运行原始查询,我们得到以下结果:
+--------+------------------+------------------+-------------+
|ColumnId| InTime | OutTime |OverlappingId|
+--------+------------------+------------------+-------------+
| 1 | 01/02/2009 10:00 | 01/02/2009 20:00 | 5 |
| 2 | 01/02/2009 2:00 | 01/02/2009 2:00 | 3 |
| 3 | 01/02/2009 2:00 | 01/02/2009 5:00 | 2 |
| 3 | 01/02/2009 2:00 | 01/02/2009 5:00 | 4 |
| 4 | 01/02/2009 3:03 | 01/02/2009 5:00 | 3 |
| 4 | 01/02/2009 3:03 | 01/02/2009 5:00 | 6 |
| 5 | 01/02/2009 10:00 | 01/02/2009 22:00 | 1 |
| 6 | 01/02/2009 3:00 | 01/02/2009 4:00 | 3 |
| 6 | 01/02/2009 3:00 | 01/02/2009 4:00 | 4 |
+--------+------------------+------------------+-------------+
Run Code Online (Sandbox Code Playgroud)
运行更新的查询我们得到以下结果:
+--------+------------------+------------------+-------------+
|ColumnId| InTime | OutTime |OverlappingId|
+--------+------------------+------------------+-------------+
| 1 | 01/02/2009 10:00 | 01/02/2009 20:00 | 5 |
| 2 | 01/02/2009 2:00 | 01/02/2009 2:00 | 3 |
| 3 | 01/02/2009 2:00 | 01/02/2009 5:00 | 2 |
| 3 | 01/02/2009 2:00 | 01/02/2009 5:00 | 4 |
| 3 | 01/02/2009 2:00 | 01/02/2009 5:00 | 6 | << missing row
| 4 | 01/02/2009 3:03 | 01/02/2009 5:00 | 3 |
| 4 | 01/02/2009 3:03 | 01/02/2009 5:00 | 6 |
| 5 | 01/02/2009 10:00 | 01/02/2009 22:00 | 1 |
| 6 | 01/02/2009 3:00 | 01/02/2009 4:00 | 3 |
| 6 | 01/02/2009 3:00 | 01/02/2009 4:00 | 4 |
+--------+------------------+------------------+-------------+
Run Code Online (Sandbox Code Playgroud)
是的,有一些ID会被重复,但这是因为它们与不同的记录重叠.
该问题还要求重叠行的数量.我不确定,问题是不够清楚,如果它想要原始表的重叠行数.
有些人建议使用a.ColumnId < b.ColumnId或者a.ColumnId > b.ColumnId为了避免重复,但是,它仍然不起作用,因为如果我们进行第一次比较,我们会得到以下结果:
+--------+------------------+------------------+-------------+
|ColumnId| InTime | OutTime |OverlappingId|
+--------+------------------+------------------+-------------+
| 1 | 01/02/2009 10:00 | 01/02/2009 20:00 | 5 |
| 2 | 01/02/2009 2:00 | 01/02/2009 2:00 | 3 |
| 3 | 01/02/2009 2:00 | 01/02/2009 5:00 | 4 |
| 3 | 01/02/2009 2:00 | 01/02/2009 5:00 | 6 |
| 4 | 01/02/2009 3:03 | 01/02/2009 5:00 | 6 |
+--------+------------------+------------------+-------------+
Run Code Online (Sandbox Code Playgroud)
如果您注意到结果中引用了所有6行样本数据,尽管它只有5行.我相信,对于这些数据,所有行在一个点或另一个点彼此重叠,重叠行的数量是6.
而为了得到这样的结果,下面的查询可以使用:
SELECT COUNT (DISTINCT a.ColumnId)
FROM TimeTable AS a
JOIN TimeTable AS b ON ((a.InTime BETWEEN b.InTime AND b.OutTime)
OR (a.OutTime BETWEEN b.InTime AND b.OutTime)
OR (a.InTime < b.InTime AND a.OutIme > b.OutTime))
AND (a.ColumnId != b.ColumnId)
Run Code Online (Sandbox Code Playgroud)
返回所有6行的计数.
仔细测试解决方案,我发现到目前为止发布的答案要么重叠检查错误,要么返回太多结果(每次重叠两行).
select
aa.ColumnId as ColumnIdA, aa.InTime as InTimeA, aa.OutTime as OutTimeA,
bb.ColumnId as ColumnIdB, bb.InTime as InTimeB, bb.OutTime as OutTimeB
from
MyTable aa
join
MyTable bb on aa.ColumnId < bb.ColumnId
where
aa.InTime < bb.OutTime
and
aa.OutTime > bb.InTime
Run Code Online (Sandbox Code Playgroud)
在定义"重叠"时必须小心.我假设如果第一个时段是凌晨3点到凌晨4点,第二个时段是凌晨4点到凌晨5点,那么这些范围不会重叠.如果一个人真的希望这种情况下,要考虑一个重叠,改<至- <=和>至- >=中where的条款.
性能与行数的平方成正比.大型数据集可以提供更快的解决方案,但比这个更复杂.
| 归档时间: |
|
| 查看次数: |
9081 次 |
| 最近记录: |