SQL 查找具有下一个最佳时间戳匹配的行对

Dan*_*iel 8 sql postgresql datetime window-functions

我的挑战是找到按时间戳相邻的行对,并仅保留值字段距离最小的那些对(差异的正值)

一个表measurement收集来自不同传感器的数据,带有时间戳和值。

id | sensor_id | timestamp | value
---+-----------+-----------+------
 1 |         1 | 12:00:00  |     5
 2 |         2 | 12:01:00  |     6
 3 |         1 | 12:02:00  |     4
 4 |         2 | 12:02:00  |     7
 5 |         2 | 12:03:00  |     3
 6 |         1 | 12:05:00  |     3
 7 |         2 | 12:06:00  |     4
 8 |         2 | 12:07:00  |     5
 9 |         1 | 12:08:00  |     6
Run Code Online (Sandbox Code Playgroud)

传感器的值从其时间戳一直有效,直到其下一条记录的时间戳(相同的 sensor_id)。

图示

在此处输入图片说明

下面的绿线显示传感器 1(蓝线)和传感器 2(红线)值随时间的距离。

我的目标是

  1. 仅组合与时间戳逻辑匹配的 2 个传感器的记录(以获得绿线)
  2. 找到距离局部最小值在
    • 12:01:00(在 12:00:00 没有传感器 2 的记录)
    • 12:05:00
    • 12:08:00

真实表位于 PostgreSQL 数据库中,包含 15 个传感器的约 500 万条记录。

测试数据

create table measurement (
    id serial,
    sensor_id integer,
    timestamp timestamp,
    value integer)
;

insert into measurement (sensor_id, timestamp, value)
values
(1, '2020-08-16 12:00:00', 5),
(2, '2020-08-16 12:01:00', 6),
(1, '2020-08-16 12:02:00', 4),
(2, '2020-08-16 12:02:00', 7),
(2, '2020-08-16 12:03:00', 3),
(1, '2020-08-16 12:05:00', 3),
(2, '2020-08-16 12:06:00', 4),
(2, '2020-08-16 12:07:00', 5),
(1, '2020-08-16 12:08:00', 6)
;
Run Code Online (Sandbox Code Playgroud)

我的方法

是选择 2 个任意传感器(通过某些传感器 ID),进行自连接并为任何传感器 1 的记录仅保留具有前一个时间戳的传感器 2 的记录(具有传感器 1 时间戳的传感器 2 的最大时间戳 <= 传感器 2 的时间戳) .

select
*
from (
    select
    *,
    row_number() over (partition by m1.timestamp order by m2.timestamp desc) rownum
    from measurement m1
    join measurement m2
        on m1.sensor_id <> m2.sensor_id
        and m1.timestamp >= m2.timestamp
    --arbitrarily sensor_ids 1 and 2
    where m1.sensor_id = 1
    and m2.sensor_id = 2
) foo
where rownum = 1

union --vice versa

select
*
from (
    select
    *,
    row_number() over (partition by m2.timestamp order by m1.timestamp desc) rownum
    from measurement m1
    join measurement m2
        on m1.sensor_id <> m2.sensor_id
        and m1.timestamp <= m2.timestamp
    --arbitrarily sensor_ids 1 and 2
    where m1.sensor_id = 1
    and m2.sensor_id = 2
) foo
where rownum = 1
;
Run Code Online (Sandbox Code Playgroud)

但这会返回一对,12:00:00其中传感器 2 没有数据(不是大问题),
并且在真实表上,语句执行不会在数小时后结束(大问题)。

我发现了某些类似的问题,但它们与我的问题不符

提前致谢!

The*_*ler 2

您可以使用几个横向连接。例如:

with
t as (select distinct timestamp as ts from measurement)
select
  t.ts, s1.value as v1, s2.value as v2,
  abs(s1.value - s2.value) as distance
from t,
lateral (
  select value
  from measurement m 
  where m.sensor_id = 1 and m.timestamp <= t.ts
  order by timestamp desc
  limit 1
) s1,
lateral (
  select value
  from measurement m 
  where m.sensor_id = 2 and m.timestamp <= t.ts
  order by timestamp desc
  limit 1
) s2
order by t.ts
Run Code Online (Sandbox Code Playgroud)

结果:

ts                     v1  v2  distance
---------------------  --  --  --------
2020-08-16 12:01:00.0   5   6         1
2020-08-16 12:02:00.0   4   7         3
2020-08-16 12:03:00.0   4   3         1
2020-08-16 12:05:00.0   3   3         0
2020-08-16 12:06:00.0   3   4         1
2020-08-16 12:07:00.0   3   5         2
2020-08-16 12:08:00.0   6   5         1
Run Code Online (Sandbox Code Playgroud)

请参阅DB Fiddle中的运行示例。

另外,如果您想要所有时间戳,甚至是不匹配的时间戳12:00:00,您可以这样做:

with
t as (select distinct timestamp as ts from measurement)
select
  t.ts, s1.value as v1, s2.value as v2,
  abs(s1.value - s2.value) as distance
from t
left join lateral (
  select value
  from measurement m 
  where m.sensor_id = 1 and m.timestamp <= t.ts
  order by timestamp desc
  limit 1
) s1 on true
left join lateral (
  select value
  from measurement m 
  where m.sensor_id = 2 and m.timestamp <= t.ts
  order by timestamp desc
  limit 1
) s2 on true
order by t.ts
Run Code Online (Sandbox Code Playgroud)

但在这些情况下,无法计算距离。

结果:

ts                     v1      v2  distance
---------------------  --  ------  --------
2020-08-16 12:00:00.0   5  <null>    <null>
2020-08-16 12:01:00.0   5       6         1
2020-08-16 12:02:00.0   4       7         3
2020-08-16 12:03:00.0   4       3         1
2020-08-16 12:05:00.0   3       3         0
2020-08-16 12:06:00.0   3       4         1
2020-08-16 12:07:00.0   3       5         2
2020-08-16 12:08:00.0   6       5         1
Run Code Online (Sandbox Code Playgroud)