Dan*_*iel 8 sql postgresql datetime window-functions
我的挑战是找到按时间戳相邻的行对,并仅保留值字段距离最小的那些对(差异的正值)
一个表measurement收集来自不同传感器的数据,带有时间戳和值。
id | sensor_id | timestamp | value
---+-----------+-----------+------
1 | 1 | 12:00:00 | 5
2 | 2 | 12:01:00 | 6
3 | 1 | 12:02:00 | 4
4 | 2 | 12:02:00 | 7
5 | 2 | 12:03:00 | 3
6 | 1 | 12:05:00 | 3
7 | 2 | 12:06:00 | 4
8 | 2 | 12:07:00 | 5
9 | 1 | 12:08:00 | 6
Run Code Online (Sandbox Code Playgroud)
传感器的值从其时间戳一直有效,直到其下一条记录的时间戳(相同的 sensor_id)。
下面的绿线显示传感器 1(蓝线)和传感器 2(红线)值随时间的距离。
我的目标是
真实表位于 PostgreSQL 数据库中,包含 15 个传感器的约 500 万条记录。
create table measurement (
id serial,
sensor_id integer,
timestamp timestamp,
value integer)
;
insert into measurement (sensor_id, timestamp, value)
values
(1, '2020-08-16 12:00:00', 5),
(2, '2020-08-16 12:01:00', 6),
(1, '2020-08-16 12:02:00', 4),
(2, '2020-08-16 12:02:00', 7),
(2, '2020-08-16 12:03:00', 3),
(1, '2020-08-16 12:05:00', 3),
(2, '2020-08-16 12:06:00', 4),
(2, '2020-08-16 12:07:00', 5),
(1, '2020-08-16 12:08:00', 6)
;
Run Code Online (Sandbox Code Playgroud)
是选择 2 个任意传感器(通过某些传感器 ID),进行自连接并为任何传感器 1 的记录仅保留具有前一个时间戳的传感器 2 的记录(具有传感器 1 时间戳的传感器 2 的最大时间戳 <= 传感器 2 的时间戳) .
select
*
from (
select
*,
row_number() over (partition by m1.timestamp order by m2.timestamp desc) rownum
from measurement m1
join measurement m2
on m1.sensor_id <> m2.sensor_id
and m1.timestamp >= m2.timestamp
--arbitrarily sensor_ids 1 and 2
where m1.sensor_id = 1
and m2.sensor_id = 2
) foo
where rownum = 1
union --vice versa
select
*
from (
select
*,
row_number() over (partition by m2.timestamp order by m1.timestamp desc) rownum
from measurement m1
join measurement m2
on m1.sensor_id <> m2.sensor_id
and m1.timestamp <= m2.timestamp
--arbitrarily sensor_ids 1 and 2
where m1.sensor_id = 1
and m2.sensor_id = 2
) foo
where rownum = 1
;
Run Code Online (Sandbox Code Playgroud)
但这会返回一对,12:00:00其中传感器 2 没有数据(不是大问题),
并且在真实表上,语句执行不会在数小时后结束(大问题)。
我发现了某些类似的问题,但它们与我的问题不符
提前致谢!
您可以使用几个横向连接。例如:
with
t as (select distinct timestamp as ts from measurement)
select
t.ts, s1.value as v1, s2.value as v2,
abs(s1.value - s2.value) as distance
from t,
lateral (
select value
from measurement m
where m.sensor_id = 1 and m.timestamp <= t.ts
order by timestamp desc
limit 1
) s1,
lateral (
select value
from measurement m
where m.sensor_id = 2 and m.timestamp <= t.ts
order by timestamp desc
limit 1
) s2
order by t.ts
Run Code Online (Sandbox Code Playgroud)
结果:
ts v1 v2 distance
--------------------- -- -- --------
2020-08-16 12:01:00.0 5 6 1
2020-08-16 12:02:00.0 4 7 3
2020-08-16 12:03:00.0 4 3 1
2020-08-16 12:05:00.0 3 3 0
2020-08-16 12:06:00.0 3 4 1
2020-08-16 12:07:00.0 3 5 2
2020-08-16 12:08:00.0 6 5 1
Run Code Online (Sandbox Code Playgroud)
请参阅DB Fiddle中的运行示例。
另外,如果您想要所有时间戳,甚至是不匹配的时间戳12:00:00,您可以这样做:
with
t as (select distinct timestamp as ts from measurement)
select
t.ts, s1.value as v1, s2.value as v2,
abs(s1.value - s2.value) as distance
from t
left join lateral (
select value
from measurement m
where m.sensor_id = 1 and m.timestamp <= t.ts
order by timestamp desc
limit 1
) s1 on true
left join lateral (
select value
from measurement m
where m.sensor_id = 2 and m.timestamp <= t.ts
order by timestamp desc
limit 1
) s2 on true
order by t.ts
Run Code Online (Sandbox Code Playgroud)
但在这些情况下,无法计算距离。
结果:
ts v1 v2 distance
--------------------- -- ------ --------
2020-08-16 12:00:00.0 5 <null> <null>
2020-08-16 12:01:00.0 5 6 1
2020-08-16 12:02:00.0 4 7 3
2020-08-16 12:03:00.0 4 3 1
2020-08-16 12:05:00.0 3 3 0
2020-08-16 12:06:00.0 3 4 1
2020-08-16 12:07:00.0 3 5 2
2020-08-16 12:08:00.0 6 5 1
Run Code Online (Sandbox Code Playgroud)