PostgreSQL - 我应该如何使用first_value()?

Bre*_*ugh 18 sql postgresql window-functions postgresql-9.2

此答案显示如何从股票代码生成高/低/开/关值:
检索任意时间间隔的聚合

我正在尝试实现基于此的解决方案(PG 9.2),但我很难获得正确的值first_value().

到目前为止,我尝试了两个查询:

SELECT  
    cstamp,
    price,
    date_trunc('hour',cstamp) AS h,
    floor(EXTRACT(minute FROM cstamp) / 5) AS m5,
    min(price) OVER w,
    max(price) OVER w,
    first_value(price) OVER w,
    last_value(price) OVER w
FROM trades
Where date_trunc('hour',cstamp) = timestamp '2013-03-29 09:00:00'
WINDOW w AS (
    PARTITION BY date_trunc('hour',cstamp), floor(extract(minute FROM cstamp) / 5)
    ORDER BY date_trunc('hour',cstamp) ASC, floor(extract(minute FROM cstamp) / 5) ASC
    )
ORDER BY cstamp;
Run Code Online (Sandbox Code Playgroud)

这是结果的一部分:

        cstamp         price      h                 m5  min      max      first    last
"2013-03-29 09:19:14";77.00000;"2013-03-29 09:00:00";3;77.00000;77.00000;77.00000;77.00000

"2013-03-29 09:26:18";77.00000;"2013-03-29 09:00:00";5;77.00000;77.80000;77.80000;77.00000
"2013-03-29 09:29:41";77.80000;"2013-03-29 09:00:00";5;77.00000;77.80000;77.80000;77.00000
"2013-03-29 09:29:51";77.00000;"2013-03-29 09:00:00";5;77.00000;77.80000;77.80000;77.00000

"2013-03-29 09:30:04";77.00000;"2013-03-29 09:00:00";6;73.99004;77.80000;73.99004;73.99004
Run Code Online (Sandbox Code Playgroud)

如您所见,77.8 不是我认为的正确值first_value(),应该是77.0.

我虽然这可能是由于不明确ORDER BYWINDOW,所以我改变了这

ORDER BY cstamp ASC 
Run Code Online (Sandbox Code Playgroud)

但这似乎也打乱了PARTITION:

        cstamp         price      h                 m5  min      max      first    last
"2013-03-29 09:19:14";77.00000;"2013-03-29 09:00:00";3;77.00000;77.00000;77.00000;77.00000

"2013-03-29 09:26:18";77.00000;"2013-03-29 09:00:00";5;77.00000;77.00000;77.00000;77.00000
"2013-03-29 09:29:41";77.80000;"2013-03-29 09:00:00";5;77.00000;77.80000;77.00000;77.80000
"2013-03-29 09:29:51";77.00000;"2013-03-29 09:00:00";5;77.00000;77.80000;77.00000;77.00000

"2013-03-29 09:30:04";77.00000;"2013-03-29 09:00:00";6;77.00000;77.00000;77.00000;77.00000
Run Code Online (Sandbox Code Playgroud)

因为max和last的值现在在分区内变化.

我究竟做错了什么?有人能帮助我更好地理解之间的关系PARTITIONORDERWINDOW


虽然我有一个答案,但这里是一个精简的pg_dump,它允许任何人重新创建表格.唯一不同的是表名.

CREATE TABLE wtest (
    cstamp timestamp without time zone,
    price numeric(10,5)
);

COPY wtest (cstamp, price) FROM stdin;
2013-03-29 09:04:54 77.80000
2013-03-29 09:04:50 76.98000
2013-03-29 09:29:51 77.00000
2013-03-29 09:29:41 77.80000
2013-03-29 09:26:18 77.00000
2013-03-29 09:19:14 77.00000
2013-03-29 09:19:10 77.00000
2013-03-29 09:33:50 76.00000
2013-03-29 09:33:46 76.10000
2013-03-29 09:33:15 77.79000
2013-03-29 09:30:08 77.80000
2013-03-29 09:30:04 77.00000
\.
Run Code Online (Sandbox Code Playgroud)

Clo*_*eto 24

SQL小提琴

您使用的所有功能都作用于窗口框架,而不是分区.如果省略,则帧结束是当前行.要使窗口框架成为整个分区,请在frame子句(range...)中声明它:

SELECT  
    cstamp,
    price,
    date_trunc('hour',cstamp) AS h,
    floor(EXTRACT(minute FROM cstamp) / 5) AS m5,
    min(price) OVER w,
    max(price) OVER w,
    first_value(price) OVER w,
    last_value(price) OVER w
FROM trades
Where date_trunc('hour',cstamp) = timestamp '2013-03-29 09:00:00'
WINDOW w AS (
    PARTITION BY date_trunc('hour',cstamp) , floor(extract(minute FROM cstamp) / 5)
    ORDER BY cstamp
    range between unbounded preceding and unbounded following
    )
ORDER BY cstamp;
Run Code Online (Sandbox Code Playgroud)


Luk*_*der 14

这是一个快速查询来说明行为:

select 
  v,
  first_value(v) over w1 f1,
  first_value(v) over w2 f2,
  first_value(v) over w3 f3,
  last_value (v) over w1 l1,
  last_value (v) over w2 l2,
  last_value (v) over w3 l3,
  max        (v) over w1 m1,
  max        (v) over w2 m2,
  max        (v) over w3 m3,
  max        (v) over () m4
from (values(1),(2),(3),(4)) t(v)
window
  w1 as (order by v),
  w2 as (order by v rows between unbounded preceding and current row),
  w3 as (order by v rows between unbounded preceding and unbounded following)
Run Code Online (Sandbox Code Playgroud)

可以在这里看到上述查询的输出(这里SQLFiddle):

| V | F1 | F2 | F3 | L1 | L2 | L3 | M1 | M2 | M3 | M4 |
|---|----|----|----|----|----|----|----|----|----|----|
| 1 |  1 |  1 |  1 |  1 |  1 |  4 |  1 |  1 |  4 |  4 |
| 2 |  1 |  1 |  1 |  2 |  2 |  4 |  2 |  2 |  4 |  4 |
| 3 |  1 |  1 |  1 |  3 |  3 |  4 |  3 |  3 |  4 |  4 |
| 4 |  1 |  1 |  1 |  4 |  4 |  4 |  4 |  4 |  4 |  4 |
Run Code Online (Sandbox Code Playgroud)

很少有人会想到应用于带有ORDER BY子句的窗口函数的隐式框架.在这种情况下,窗口默认为框架ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW.以这种方式思考:

  • v = 1有序窗口的框架跨度的行上v IN (1)
  • v = 2有序窗口的框架跨度的行上v IN (1, 2)
  • v = 3有序窗口的框架跨度的行上v IN (1, 2, 3)
  • v = 4有序窗口的框架跨度的行上v IN (1, 2, 3, 4)

如果要阻止该行为,您有两种选择:

  • ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING有序窗口函数使用显式子句
  • ORDER BY在那些允许省略它们的窗口函数中使用no 子句(as MAX(v) OVER())

更多细节解释的这篇文章LEAD(),LAG(),FIRST_VALUE()LAST_VALUE()

  • 我们不希望我的 .. 呃 .. 任何人的强迫症表现出来,现在是吗?:) (2认同)