PostgreSQL中一行中窗口函数的第一个和最后一个值

Mic*_*ndr 2 postgresql window-functions

我想在指定分区的一行中获得一列的第一个值和第二列的最后一个值.为此我创建了这个查询:

SELECT DISTINCT
b.machine_id,
batch,
timestamp_sta,
timestamp_stp,
FIRST_VALUE(timestamp_sta) OVER w AS batch_start,
LAST_VALUE(timestamp_stp) OVER w AS batch_end
FROM db_data.sta_stp AS a
JOIN db_data.ll_lu AS b
ON a.ll_lu_id=b.id
WINDOW w AS (PARTITION BY batch, machine_id ORDER BY timestamp_sta)
ORDER BY timestamp_sta, batch, machine_id;
Run Code Online (Sandbox Code Playgroud)

但正如您在图像中看到的那样,batch_end列中返回的数据不正确.

batch_start列具有正确的timestamp_sta列的第一个值.但是batch_end应为"2012-09-17 10:49:45",它等于同一行的timestamp_stp.

为什么会这样?

在此输入图像描述

JGH*_*JGH 6

@ŁukaszKamiński给出的解释解决了问题的核心.

但是,last_value应该替换为max().您正在排序,timestamp_sta因此最后一个值是具有最大值的值timestamp_sta,可能与也可能不相关timestamp_stp.我也会按两个字段排序.

SELECT DISTINCT
  b.machine_id,
  batch,
  timestamp_sta,
  timestamp_stp,
  FIRST_VALUE(timestamp_sta) OVER w AS batch_start,
  MAX(timestamp_stp) OVER w AS batch_end
FROM db_data.sta_stp AS a
JOIN db_data.ll_lu AS b
ON a.ll_lu_id=b.id
WINDOW w AS (PARTITION BY batch, machine_id 
             ORDER BY timestamp_sta,timestamp_stp 
             RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
ORDER BY timestamp_sta, batch, machine_id;
Run Code Online (Sandbox Code Playgroud)

http://rextester.com/UTDE60342


Erw*_*ter 6

问题是陈旧的,但这个解决方案比目前发布的更简单,更快:

SELECT b.machine_id
     , batch
     , timestamp_sta
     , timestamp_stp
     , min(timestamp_sta) OVER w AS batch_start
     , max(timestamp_stp) OVER w AS batch_end
FROM   db_data.sta_stp a
JOIN   db_data.ll_lu   b ON a.ll_lu_id = b.id
WINDOW w AS (PARTITION BY batch, b.machine_id) -- No ORDER BY !
ORDER  BY timestamp_sta, batch, machine_id; -- why this ORDER BY?
Run Code Online (Sandbox Code Playgroud)

如果添加ORDER BY到窗口框架定义,ORDER BY则具有更大表达式的每个下一行都具有稍后的帧开始.然后,min()first_value()不能返回整个分区的"第一个"时间戳.如果没有ORDER BY相同分区的所有行都是同行,那么您将获得所需的结果.

你添加的ORDER BY 作品(不是窗口框架定义中的那个,外部的),但似乎没有意义,并使查询更昂贵.您应该使用ORDER BY与窗口框架定义一致的子句,以避免额外的排序成本:

... 
ORDER BY batch, b.machine_id, timestamp_sta, timestamp_stp
Run Code Online (Sandbox Code Playgroud)

我不认为DISTINCT在这个查询中需要.如果你确实需要它,你可以添加它.或者DISTINCT ON ().但随后该ORDER BY条款变得更加相关.看到:

如果你需要在同一行一些列(S)(同时仍通过时间戳排序),你的想法FIRST_VALUE(),并LAST_VALUE()可能是要走的路.您可能需要将此附加到窗口框架定义,然后:

ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
Run Code Online (Sandbox Code Playgroud)

看到: