ERJ*_*JAN 8 time range-types postgresql-9.6
使用 postgresql 9.6。
该表有用户会话,我需要打印不同的非重叠会话。
CREATE TABLE SESSIONS(
id serial NOT NULL PRIMARY KEY,
ctn INT NOT NULL,
day DATE NOT NULL,
f_time TIME(0) NOT NULL,
l_time TIME(0) NOT NULL
);
INSERT INTO SESSIONS(id, ctn, day, f_time, l_time)
VALUES
(1, 707, '2019-06-18', '10:48:25', '10:56:17'),
(2, 707, '2019-06-18', '10:48:33', '10:56:17'),
(3, 707, '2019-06-18', '10:53:17', '11:00:49'),
(4, 707, '2019-06-18', '10:54:31', '10:57:37'),
(5, 707, '2019-06-18', '11:03:59', '11:10:39'),
(6, 707, '2019-06-18', '11:04:41', '11:08:02'),
(7, 707, '2019-06-18', '11:11:04', '11:19:39');
Run Code Online (Sandbox Code Playgroud)
id ctn day f_time l_time
1 707 2019-06-18 10:48:25 10:56:17
2 707 2019-06-18 10:48:33 10:56:17
3 707 2019-06-18 10:53:17 11:00:49
4 707 2019-06-18 10:54:31 10:57:37
5 707 2019-06-18 11:03:59 11:10:39
6 707 2019-06-18 11:04:41 11:08:02
7 707 2019-06-18 11:11:04 11:19:39
Run Code Online (Sandbox Code Playgroud)
现在我需要不同的非重叠用户会话,所以它应该给我:
1. start_time: 10:48:25 end_time: 11:00:49 duration: 12min,24 sec
2. start_time: 11:03:59 end_time: 11:10:39 duration: 6min,40 sec
3. start_time: 11:11:04 end_time: 11:19:39 duration: 8min,35 sec
Run Code Online (Sandbox Code Playgroud)
为了解决这个问题,我做了以下事情:
对于这部分,我稍微添加了 OP 提供的表定义。我坚信 DDL 应该最大程度地用于“指导”整个数据库编程过程,并且可能会更强大 - 一个例子是CHECK约束中的SQL - 到目前为止仅由 Firebird 提供(此处的示例)和 H2(请参阅此处的参考)。
然而,这一切都很好,但我们必须处理 PostgreSQL 的 9.6 功能 - OP 的版本。我为“简单”解释调整了 DDL(请参阅此处的整个小提琴):
CREATE TABLE sessions
(
id serial NOT NULL PRIMARY KEY,
ctn INT NOT NULL,
f_day DATE NOT NULL,
f_time TIME(0) NOT NULL,
l_time TIME(0) NOT NULL,
CONSTRAINT ft_less_than_lt_ck CHECK (f_time < l_time),
CONSTRAINT ctn_f_day_f_time_uq UNIQUE (ctn, f_day, f_time),
CONSTRAINT ctn_f_day_l_time_uq UNIQUE (ctn, f_day, l_time)
-- could put in a DISTINCT somewhere if you don't have these constraints
-- maybe has TIME(2) - but see complex solution
);
Run Code Online (Sandbox Code Playgroud)
索引:
CREATE INDEX ctn_ix ON sessions USING BTREE (ctn ASC);
CREATE INDEX f_day_ix ON sessions USING BTREE (f_day ASC);
CREATE INDEX f_time_ix ON sessions USING BTREE (f_time ASC);
Run Code Online (Sandbox Code Playgroud)
需要注意的一点:不要使用SQL 关键字作为表名或列名 -day就是这样的关键字!调试 &c 可能会令人困惑 - 这根本不是一个好习惯。我已将您的原始字段名称更改day为f_day- 注意所有小写和 python 大小写!无论你做什么,都有一个命名变量的标准方法并坚持下去- 那里有许多编码标准文档。
'f_day' 的更改对 SQL 的其余部分没有影响,因为我们没有考虑跨越午夜的会话。通过执行以下操作可以相对容易地考虑到这些(参见小提琴)。
SELECT (f_day + f_time)::TIMESTAMP FROM sessions;
Run Code Online (Sandbox Code Playgroud)
现在GENERATED列的出现,您甚至不必担心这个 - 只需有一个GENERATED如上所述的字段!
如果对第二个的约束不可行 - 同时登录,您可能会使用TIME(2) (or 3..6)以确保唯一性。如果 [你不想要 | 不能有]UNIQUE约束,您可以在DISTINCTSQL 中输入相同的登录和注销时间 - 尽管这不太可能。
事实仍然是,一些像这样的简单 DDL极大地简化了您的后续 SQL(请参阅下面“复杂”解释末尾的讨论)。
您可能还想放置ctn和/或day放入您的 DDLUNIQUE约束,如图所示?我还添加了我认为可能是好的索引!您可能还想调查OVERLAPS运营商?
至于示例数据,我还添加了一些记录来测试我的解决方案,如下所示:
INSERT INTO sessions (id, ctn, day, f_time, l_time)
VALUES
( 1, 707, '2019-06-18', '10:48:25', '10:56:17'),
( 2, 707, '2019-06-18', '10:53:17', '11:00:49'),
( 3, 707, '2019-06-18', '10:54:31', '10:59:43'), -- record 3 is completely covered
-- by record 2
( 4, 707, '2019-06-18', '11:03:59', '11:10:39'),
( 5, 707, '2019-06-18', '11:04:41', '11:08:02'), -- GROUP 2 record 6 completely
-- covers record 7
( 6, 707, '2019-06-18', '11:11:04', '11:19:39'), -- GROUP 3
( 7, 707, '2019-06-18', '12:15:15', '13:13:13'),
( 8, 707, '2019-06-18', '13:04:41', '13:20:02'),
( 9, 707, '2019-06-18', '13:17:17', '13:22:22'), -- GROUP 4
(10, 707, '2019-06-18', '14:05:17', '14:14:14'); -- GROUP 5
Run Code Online (Sandbox Code Playgroud)
我将一步一步地梳理我的逻辑——也许对你有好处,但对我也有好处,因为它可以帮助我澄清我的想法,并确保我从这个练习中学到的教训会留在我身边——“我听到并我忘记了。我看到了,我记住了。我知道了,我明白了。” -孔子。
以下所有内容都包含在小提琴中。
/**
So, the desired result is:
Interval 1 - start: 10:48:25 - end 11:00:49
Interval 2 - start: 11:03:59 - end 11:10:39
Interval 3 - start: 11:11:04 - end 11:19:39
Interval 4 - start: 12:15:15 - end 13:22:22
Interval 5 - start: 14:05:17 - end 14:14:14
**/
Run Code Online (Sandbox Code Playgroud)
SELECT
s.id AS id, s.ctn AS ctn, s.f_time AS ft, s.l_time AS lt,
CASE
WHEN LAG(s.l_time) OVER () > f_time THEN 0
ELSE 1
END AS ovl
FROM sessions s
Run Code Online (Sandbox Code Playgroud)
结果:
id ctn ft lt ovl
1 707 10:48:25 10:56:17 1
2 707 10:53:17 11:00:49 0
3 707 10:54:31 10:59:43 0
4 707 11:03:59 11:10:39 1
5 707 11:04:41 11:08:02 0
6 707 11:11:04 11:19:39 1
7 707 12:15:15 13:13:13 1
8 707 13:04:41 13:20:02 0
9 707 13:17:17 13:22:22 0
10 707 14:05:17 14:14:14 1
Run Code Online (Sandbox Code Playgroud)
所以,每当有一个新的间隔时,就会有一个 1 ovl(重叠)列中。
接下来,我们SUM按如下方式计算这些 1 的累积:
SELECT
t1.id, t1.ctn, t1.ft, t1.lt, t1.ovl,
SUM(ovl) OVER (ORDER BY t1.ft ASC ROWS BETWEEN UNBOUNDED PRECEDING
AND CURRENT ROW) AS s
FROM
(
SELECT
s.id AS id, s.ctn AS ctn, s.f_time AS ft, s.l_time AS lt,
CASE
WHEN LAG(s.l_time) OVER () > f_time THEN 0
ELSE 1
END AS ovl
FROM sessions s
) AS t1
ORDER BY lt, id
Run Code Online (Sandbox Code Playgroud)
结果:
id ctn ft lt ovl s
1 707 10:48:25 10:56:17 1 1
3 707 10:54:31 10:59:43 0 1
2 707 10:53:17 11:00:49 0 1
5 707 11:04:41 11:08:02 0 2
4 707 11:03:59 11:10:39 1 2
6 707 11:11:04 11:19:39 1 3
7 707 12:15:15 13:13:13 1 4
8 707 13:04:41 13:20:02 0 4
9 707 13:17:17 13:22:22 0 4
10 707 14:05:17 14:14:14 1 5
Run Code Online (Sandbox Code Playgroud)
所以,我们现在已经“拆分”了,并且有办法区分我们的区间——每个区间都有不同的值 s- 1..5。
所以,现在我们想要获得这些区间的最低值f_time和最高值l_time。我第一次尝试使用MAX()和MIN()进行如下:
SELECT
ROW_NUMBER() OVER (PARTITION BY s) AS rn,
MIN(ft) OVER (PARTITION BY s ORDER BY ft, lt) AS min_f,
MAX(lt) OVER (PARTITION BY s ORDER BY ft, lt) AS max_l,
s
FROM
(
SELECT
t1.id, t1.ctn, t1.ft, t1.lt, t1.ovl,
SUM(ovl) OVER (ORDER BY t1.ft ASC ROWS BETWEEN UNBOUNDED PRECEDING
AND CURRENT ROW) AS s
FROM
(
SELECT
s.id AS id, s.ctn AS ctn, s.f_time AS ft, s.l_time AS lt,
CASE
WHEN LAG(s.l_time) OVER () > f_time THEN 0
ELSE 1
END AS ovl
FROM sessions s
) AS t1;
ORDER BY id, lt
)AS t2
ORDER BY s, rn ASC, min_f;
Run Code Online (Sandbox Code Playgroud)
结果:
rn min_f max_l s
1 10:48:25 10:56:17 1
2 10:48:25 11:00:49 1
3 10:48:25 11:00:49 1
1 11:03:59 11:10:39 2
2 11:03:59 11:10:39 2
1 11:11:04 11:19:39 3
1 12:15:15 13:13:13 4
2 12:15:15 13:20:02 4
3 12:15:15 13:22:22 4
1 14:05:17 14:14:14 5
Run Code Online (Sandbox Code Playgroud)
请注意我们如何获得rn第一个区间的rn= 3,第四个区间的= 3 以及rn不同区间的不同值- 如果有 7 个子区间组成一个区间,那么我们将不得不检索rn= 7 - 这让我感到困惑一阵子!
然后 Window 函数的力量就派上用场了——如果你对MAX()和 进行MIN()不同的排序,正确的结果就会出现在我们的腿上:
SELECT
ROW_NUMBER() OVER (PARTITION BY s) AS rn,
MIN(ft) OVER (PARTITION BY s ORDER BY ft, lt DESC) AS min_f,
MAX(lt) OVER (PARTITION BY s ORDER BY ft DESC, lt) AS max_l,
s
FROM
(
SELECT
t1.id, t1.ctn, t1.ft, t1.lt, t1.ovl,
SUM(ovl) OVER (ORDER BY t1.ft ASC ROWS BETWEEN UNBOUNDED PRECEDING
AND CURRENT ROW) AS s
FROM
(
SELECT
s.id AS id, s.ctn AS ctn, s.f_time AS ft, s.l_time AS lt,
CASE
WHEN LAG(s.l_time) OVER () > f_time THEN 0
ELSE 1
END AS ovl
FROM sessions s
) AS t1
ORDER BY id, lt
)AS t2
ORDER BY s, rn ASC, min_f;
Run Code Online (Sandbox Code Playgroud)
结果:
rn min_f max_l s
1 10:48:25 11:00:49 1
2 10:48:25 11:00:49 1
3 10:48:25 10:59:43 1
1 11:03:59 11:10:39 2
2 11:03:59 11:08:02 2
1 11:11:04 11:19:39 3
1 12:15:15 13:22:22 4
2 12:15:15 13:22:22 4
3 12:15:15 13:22:22 4
1 14:05:17 14:14:14 5
Run Code Online (Sandbox Code Playgroud)
请注意,现在,rn= 1始终是我们想要的记录 - 这是以下结果:
MIN(ft) OVER (PARTITION BY s ORDER BY ft, lt DESC) AS min_f,
MAX(lt) OVER (PARTITION BY s ORDER BY ft DESC, lt) AS max_l,
Run Code Online (Sandbox Code Playgroud)
请注意,forMIN()的排序是 bylt DESC和 for MAX()(按间隔划分的 - 即s)它是 by ft DESC。这将最小的ft与最大的相匹配lt这正是我们想要的。
这基本上是我们想要的结果 - 只需根据 OP 的要求添加一些整理和格式,我们就可以开始了。这部分还演示了另一个非常有用的窗口函数 - ROW_NUMBER().
SELECT
ROW_NUMBER() OVER () AS "Interval No.",
' Start time: ' AS " ",
t3.min_f AS "Interval start" ,
' End time: ' AS " ",
t3.max_l AS "Interval stop",
' Duration: ' AS " ",
(t3.max_l - t3.min_f) AS "Duration"
FROM
(
SELECT
ROW_NUMBER() OVER (PARTITION BY s) AS rn,
MIN(ft) OVER (PARTITION BY s ORDER BY ft, lt DESC) AS min_f,
MAX(lt) OVER (PARTITION BY s ORDER BY ft DESC, lt) AS max_l,
s
FROM
(
SELECT
t1.id, t1.ctn, t1.ft, t1.lt, t1.ovl,
SUM(ovl) OVER (ORDER BY t1.ft ASC ROWS BETWEEN UNBOUNDED PRECEDING
AND CURRENT ROW) AS s
FROM
(
SELECT
s.id AS id, s.ctn AS ctn, s.f_time AS ft, s.l_time AS lt,
CASE
WHEN LAG(s.l_time) OVER () > f_time THEN 0
ELSE 1
END AS ovl
FROM sessions s
) AS t1
ORDER BY id, lt
)AS t2
ORDER BY s, rn ASC, min_f
) AS t3
WHERE t3.rn = 1;
Run Code Online (Sandbox Code Playgroud)
最后结果:
Interval No. Interval start Interval stop Duration
1 Start time: 10:48:25 End time: 11:00:49 Duration: 00:12:24
2 Start time: 11:03:59 End time: 11:10:39 Duration: 00:06:40
3 Start time: 11:11:04 End time: 11:19:39 Duration: 00:08:35
4 Start time: 12:15:15 End time: 13:22:22 Duration: 01:07:07
5 Start time: 14:05:17 End time: 14:14:14 Duration: 00:08:57
Run Code Online (Sandbox Code Playgroud)
如果有大量记录,我无法保证此查询的性能,请参阅EXPLAIN (ANALYZE, BUFFERS)小提琴末尾的结果。但是,我假设由于它采用报告样式格式,因此可能适用于ctn和/或的给定值day- 即没有太多记录?
我不会展示每一步 - 消除重复的f_times 和l_times 后,步骤是相同的。
在这里,表定义和数据略有不同(此处提供小提琴):
CREATE TABLE sessions
(
id serial NOT NULL PRIMARY KEY,
ctn INT NOT NULL,
f_day DATE NOT NULL,
f_time TIME(0) NOT NULL,
l_time TIME(0) NOT NULL,
CONSTRAINT ft_lt_lt CHECK (f_time < l_time),
-- CONSTRAINT ft_uq UNIQUE (f_time),
-- CONSTRAINT lt_uq UNIQUE (l_time)
CONSTRAINT ft_lt_uq UNIQUE(f_time, l_time)
-- could put in a DISTINCT somewhere to counter this possibility or
-- maybe have TIME(2) to ensure no duplicates?
);
Run Code Online (Sandbox Code Playgroud)
我保留的唯一限制是CHECK (f_time < l_time)(不能是任何其他方式)和UNIQUE f_time, l_time(可能添加day和/或添加ctn到 - 关于TIME(2) or (3...6)也适用。
我把它留给读者适用UNIQUE于组合ctn和f_day适用!
INSERT INTO sessions (id, ctn, day, f_time, l_time)
VALUES
( 1, 707, '2019-06-18', '10:48:25', '10:56:17'), -- note - same l_times
( 2, 707, '2019-06-18', '10:48:33', '10:56:17'), -- need one with lowest f_time
( 3, 707, '2019-06-18', '10:53:17', '11:00:49'),
( 4, 707, '2019-06-18', '10:54:31', '10:59:43'), -- note - same f_times
-- need one with greatest l_time
( 5, 707, '2019-06-18', '10:54:31', '10:57:37'), -- GROUP 1
( 6, 707, '2019-06-18', '11:03:59', '11:10:39'),
( 7, 707, '2019-06-18', '11:04:41', '11:08:02'), -- GROUP 2, record 6 completely
-- covers record 7
( 8, 707, '2019-06-18', '11:11:04', '11:19:39'), -- GROUP 3
( 9, 707, '2019-06-18', '12:15:15', '13:13:13'),
(10, 707, '2019-06-18', '13:04:41', '13:20:02'),
(11, 707, '2019-06-18', '13:17:17', '13:22:22'), -- GROUP 4
(12, 707, '2019-06-18', '14:05:17', '14:14:14'); -- GROUP 5
Run Code Online (Sandbox Code Playgroud)
我添加了几个具有相同f_time和l_time相同间隔的潜在“麻烦”记录(2 和 4)。因此,在相同 的情况下f_time,我们希望子间隔最大l_time,反之亦然,对于相同l_time(即最小f_time)的情况。
因此,在这种情况下,我所做的是通过链接CTE's(也称为WITH子句)来消除重复项,如下所示:
WITH cte1 AS
(
SELECT s.*, t.mt, t.lt
FROM sessions s
JOIN
(
SELECT
DISTINCT
ctn,
MIN(f_time) AS mt,
l_time AS lt
FROM sessions
GROUP BY ctn, l_time
ORDER BY l_time
) AS t
ON (s.ctn, s.f_time, s.l_time) = (t.ctn, t.mt, t.lt)
ORDER BY s.l_time
),
cte2 AS
(
SELECT
DISTINCT
ctn,
f_time AS ft,
MAX(lt) AS lt
FROM cte1
GROUP BY ctn, f_time
ORDER BY f_time
)
SELECT * FROM cte2
ORDER BY ft;
Run Code Online (Sandbox Code Playgroud)
结果:
ctn ft lt
707 10:48:25 10:56:17
707 10:53:17 11:00:49
707 10:54:31 10:59:43
707 11:03:59 11:10:39
707 11:04:41 11:08:02
707 11:11:04 11:19:39
707 12:15:15 13:13:13
707 13:04:41 13:20:02
707 13:17:17 13:22:22
707 14:05:17 14:14:14
Run Code Online (Sandbox Code Playgroud)
然后我治疗 cte2在“简单”的解释中将其视为流程的起点。
最终的SQL如下:
WITH cte1 AS
(
SELECT s.*, t.mt, t.lt
FROM sessions s
JOIN
(
SELECT
DISTINCT
ctn,
MIN(f_time) AS mt,
l_time AS lt
FROM sessions
GROUP BY ctn, l_time
ORDER BY l_time
) AS t
ON (s.ctn, s.f_time, s.l_time) = (t.ctn, t.mt, t.lt)
ORDER BY s.l_time
),
cte2 AS
(
SELECT
DISTINCT
ctn,
f_time AS ft,
MAX(lt) AS lt
FROM cte1
GROUP BY ctn, f_time
ORDER BY f_time
)
SELECT
ROW_NUMBER() OVER () AS "Interval No.",
' Start time: ' AS " ",
t3.min_f AS "Interval start" ,
' End time: ' AS " ",
t3.max_l AS "Interval stop",
' Duration: ' AS " ",
(t3.max_l - t3.min_f) AS "Duration"
FROM
(
SELECT
ROW_NUMBER() OVER (PARTITION BY s) AS rn,
MIN(ft) OVER (PARTITION BY s ORDER BY ft, lt DESC) AS min_f,
MAX(lt) OVER (PARTITION BY s ORDER BY ft DESC, lt) AS max_l,
s
FROM
(
SELECT
t1.ctn, t1.ft, t1.lt, t1.ovl,
SUM(ovl) OVER (ORDER BY t1.ft ASC ROWS BETWEEN UNBOUNDED PRECEDING
AND CURRENT ROW) AS s
FROM
(
SELECT
c.ctn AS ctn, c.ft AS ft, c.lt AS lt,
CASE
WHEN LAG(c.lt) OVER () > ft THEN 0
ELSE 1
END AS ovl
FROM cte2 c
) AS t1
ORDER BY t1.lt
) AS t2
ORDER BY s, rn ASC, min_f
) AS t3
WHERE t3.rn = 1
ORDER BY t3.rn;
Run Code Online (Sandbox Code Playgroud)
结果:
Interval No. Interval start Interval stop Duration
1 Start time: 10:48:25 End time: 11:00:49 Duration: 00:12:24
2 Start time: 11:03:59 End time: 11:08:02 Duration: 00:04:03
3 Start time: 11:11:04 End time: 11:19:39 Duration: 00:08:35
4 Start time: 12:15:15 End time: 13:22:22 Duration: 01:07:07
5 Start time: 14:05:17 End time: 14:14:14 Duration: 00:08:57
Run Code Online (Sandbox Code Playgroud)
如您所见,这是一件非常麻烦的事情——UNIQUE在 DDL 中没有约束使 SQL 的长度和规划和执行阶段所花费的时间加倍,并且使其变得非常糟糕。
有关两个查询的计划,请参阅小提琴的结尾!在那里要吸取的教训!根据经验,计划越长,查询越慢!
我不确定索引可以在这里发挥任何作用,因为我们是从整个表中进行选择的,而且它非常小!如果我们通过ctnand/orf_day和/or过滤大表f_time,我很确定如果没有索引,我们会开始看到计划(和时间安排!)的差异!
| 归档时间: |
|
| 查看次数: |
664 次 |
| 最近记录: |