jon*_*jon 9 sql intervals gaps-and-islands
不使用MSSQL或DB2或Oracle.没有CTE.没有OVERLAP谓词.没有INTERVAL数据类型.情况:在要修理的车辆上工作不能开始,直到收到所有订购的部件.零件可在维修开始前多次订购.我们需要提取车辆在"零件保持"的时间
因此,对于标识为id = 1的车辆,在4个不同的场合下订购(d1)并接收(d2)
ID d1 d2
1 8/1 8/8
1 8/2 8/6
1 8/12 8/14
1 8/3 8/10
8/1 8/8
d1 d2
|-------------------------------|
8/2 8/6 8/12 8/14
d1 d2 d1 d2
|---------------| |----------|
8/3 8/10
d1 d2
|---------------------|
8/1 8/14
|---------------------------------------------------------| = 13 days
8/10 8/12
|--------------------------------------| + |----------| = parts hold = 11 days
Run Code Online (Sandbox Code Playgroud)
如上所示,开始工作的等待时间(假设车辆可用于工作的日期为8/1)为13天.等待零件所花费的实际时间是11天,这是我们需要从数据中得出的数字.实际的日期时间数据将是我们将从中提取小时数的时间戳,我们在此示例数据中使用日期以简化演示.我们正在努力生成一个集合(不是psm,而不是udf,而不是游标)的解决方案.TIA
我无法让@Alex W的查询工作.它不是标准的SQL,因此需要大量重写才能与SQL Server(我可以测试)兼容.但它确实给了我一些灵感,我已经扩展了.
查找每个不间断等待期间的所有起点:
SELECT DISTINCT
t1.ID,
t1.d1 AS date,
-DATEDIFF(DAY, (SELECT MIN(d1) FROM Orders), t1.d1) AS n
FROM Orders t1
LEFT JOIN Orders t2 -- Join for any events occurring while this
ON t2.ID = t1.ID -- is starting. If this is a start point,
AND t2.d1 <> t1.d1 -- it won't match anything, which is what
AND t1.d1 BETWEEN t2.d1 AND t2.d2 -- we want.
GROUP BY t1.ID, t1.d1, t1.d2
HAVING COUNT(t2.ID) = 0
Run Code Online (Sandbox Code Playgroud)
和终点的等价物:
SELECT DISTINCT
t1.ID,
t1.d2 AS date,
DATEDIFF(DAY, (SELECT MIN(d1) FROM Orders), t1.d2) AS n
FROM Orders t1
LEFT JOIN Orders t2
ON t2.ID = t1.ID
AND t2.d2 <> t1.d2
AND t1.d2 BETWEEN t2.d1 AND t2.d2
GROUP BY t1.ID, t1.d1, t1.d2
HAVING COUNT(t2.ID) = 0
Run Code Online (Sandbox Code Playgroud)
n是一些常见时间点以来的天数.起点具有负值,终点具有正值.这样我们就可以添加它们以获得介于两者之间的天数.
span = end - start
span = end + (-start)
span1 + span2 = end1 + (-start1) + end2 + (-start2)
Run Code Online (Sandbox Code Playgroud)
最后,我们只需要添加内容:
SELECT ID, SUM(n) AS hold_days
FROM (
SELECT DISTINCT
t1.id,
t1.d1 AS date,
-DATEDIFF(DAY, (SELECT MIN(d1) FROM Orders), t1.d1) AS n
FROM Orders t1
LEFT JOIN Orders t2
ON t2.ID = t1.ID
AND t2.d1 <> t1.d1
AND t1.d1 BETWEEN t2.d1 AND t2.d2
GROUP BY t1.ID, t1.d1, t1.d2
HAVING COUNT(t2.ID) = 0
UNION ALL
SELECT DISTINCT
t1.id,
t1.d2 AS date,
DATEDIFF(DAY, (SELECT MIN(d1) FROM Orders), t1.d2) AS n
FROM Orders t1
LEFT JOIN Orders t2
ON t2.ID = t1.ID
AND t2.d2 <> t1.d2
AND t1.d2 BETWEEN t2.d1 AND t2.d2
GROUP BY t1.ID, t1.d1, t1.d2
HAVING COUNT(t2.ID) = 0
ORDER BY ID, date
) s
GROUP BY ID;
Run Code Online (Sandbox Code Playgroud)
输入表(订单):
ID d1 d2
1 2011-08-01 2011-08-08
1 2011-08-02 2011-08-06
1 2011-08-03 2011-08-10
1 2011-08-12 2011-08-14
2 2011-08-01 2011-08-03
2 2011-08-02 2011-08-06
2 2011-08-05 2011-08-09
Run Code Online (Sandbox Code Playgroud)
输出:
ID hold_days
1 11
2 8
Run Code Online (Sandbox Code Playgroud)
或者,您可以使用存储过程执行此操作.
CREATE PROCEDURE CalculateHoldTimes
@ID int = 0
AS
BEGIN
DECLARE Events CURSOR FOR
SELECT *
FROM (
SELECT d1 AS date, 1 AS diff
FROM Orders
WHERE ID = @ID
UNION ALL
SELECT d2 AS date, -1 AS diff
FROM Orders
WHERE ID = @ID
) s
ORDER BY date;
DECLARE @Events_date date,
@Events_diff int,
@Period_start date,
@Period_accum int,
@Total_start date,
@Total_count int;
OPEN Events;
FETCH NEXT FROM Events
INTO @Events_date, @Events_diff;
SET @Period_start = @Events_date;
SET @Period_accum = 0;
SET @Total_start = @Events_date;
SET @Total_count = 0;
WHILE @@FETCH_STATUS = 0
BEGIN
SET @Period_accum = @Period_accum + @Events_diff;
IF @Period_accum = 1 AND @Events_diff = 1
-- Start of period
SET @Period_start = @Events_date;
ELSE IF @Period_accum = 0 AND @Events_diff = -1
-- End of period
SET @Total_count = @Total_count +
DATEDIFF(day, @Period_start, @Events_date);
FETCH NEXT FROM Events
INTO @Events_date, @Events_diff;
END;
SELECT
@Total_start AS d1,
@Events_date AS d2,
@Total_count AS hold_time;
END;
Run Code Online (Sandbox Code Playgroud)
称之为:
EXEC CalculateHoldTimes 1;
Run Code Online (Sandbox Code Playgroud)
这条SQL语句似乎得到了你想要的(t是sampe表的表名):
SELECT
d.id,
d.duration,
d.duration -
IFNULL(
( SELECT Sum( timestampdiff( SQL_TSI_DAY,
no_hold.d2,
( SELECT min(d1) FROM t t4
WHERE t4.id = no_hold.id and t4.d1 > no_hold.d2 )))
FROM ( SELECT DISTINCT id, d2 FROM t t1
WHERE ( SELECT sum( IIF( t1.d2 between t2.d1 and t2.d2, 1, 0 ) )
FROM t t2 WHERE t2.id = t1.id and t2.d2 <> t1.d2 ) = 0
And d2 <> ( select max( d2 ) from t t3 where t3.id = t1.id )) no_hold
WHERE no_hold.id = d.id ),
0 ) "parts hold"
FROM
( SELECT id, timestampdiff( SQL_TSI_DAY, min( d1 ), max( d2 ) ) duration
FROM t GROUP BY id ) d
Run Code Online (Sandbox Code Playgroud)
外部查询获取修复工作的持续时间。复杂子查询计算不等待零件的总天数。这是通过定位车辆不等待零件的开始日期来完成的,然后计算它再次开始等待零件的天数:
// 1) The query for finding the starting dates when the vehicle is not waiting for parts,
// i.e. finding all d2 that is not within any date range where the vehicle is waiting for part.
// The DISTINCT is needed to removed duplicate starting "no hold" period.
SELECT DISTINCT id, d2
FROM t t1
WHERE ( SELECT sum( IIF( t1.d2 between t2.d1 and t2.d2, 1, 0 ) ) from t t2
WHERE t2.id = t1.id and t2.d2 <> t1.d2 ) = 0 AND
d2 <> ( SELECT max( d2 ) FROM t t3 WHERE t3.id = t1.id ) )
Run Code Online (Sandbox Code Playgroud)
// 2) 车辆不等待零件的日期是从上述查询到车辆再次等待零件的日期
timestampdiff( SQL_TSI_DAY, no_hold.d2, ( SELECT min(d1) FROM t t4 WHERE t4.id = no_hold.id and t4.d1 > no_hold.d2 ) )
Run Code Online (Sandbox Code Playgroud)
将上述两者结合并汇总所有此类时间段,即可得出车辆不等待零件的天数。最后的查询添加了一个额外的条件来计算外部查询中每个 id 的结果。
这在具有许多 id 的非常大的表上可能不是非常有效。如果 id 仅限于一个或几个,应该没问题。
| 归档时间: |
|
| 查看次数: |
5884 次 |
| 最近记录: |