窗口函数LAG可以引用正在计算值的列吗?

Rom*_*val 8 postgresql gaps-and-islands

我需要根据当前记录的其他一些列和前一条记录的X值(使用一些分区和顺序)计算某些列X的值.基本上我需要在表单中实现查询

SELECT <some fields>, 
  <some expression using LAG(X) OVER(PARTITION BY ... ORDER BY ...) AS X
FROM <table>
Run Code Online (Sandbox Code Playgroud)

这是不可能的,因为只有现有的列可以在窗口函数中使用,所以我正在寻找如何克服这一点.

这是一个例子.我有一张活动表.每个活动都有typetime_stamp.

create table event (id serial, type integer, time_stamp integer);
Run Code Online (Sandbox Code Playgroud)

我不想找到"重复"事件.副本我的意思是以下.让我们typetime_stamp升序给出所有事件.然后

  1. 第一个事件不重复
  2. 所有跟随非重复且在其后的某个时间范围内的事件(即它们time_stamp不大于time_stamp之前的非重复加上某些常量TIMEFRAME)是重复的
  3. 下一个事件,time_stamp如果大于先前的非重复次数超过TIMEFRAME不重复
  4. 等等

对于这个数据

insert into event (type, time_stamp) 
 values 
  (1, 1), (1, 2), (2, 2), (1,3), (1, 10), (2,10), 
  (1,15), (1, 21), (2,13), 
  (1, 40);
Run Code Online (Sandbox Code Playgroud)

TIMEFRAME=10结果应该是

time_stamp | type | duplicate
-----------------------------
        1  |    1 | false
        2  |    1 | true     
        3  |    1 | true 
       10  |    1 | true 
       15  |    1 | false 
       21  |    1 | true
       40  |    1 | false
        2  |    2 | false
       10  |    2 | true
       13  |    2 | false
Run Code Online (Sandbox Code Playgroud)

我可以duplicate根据当前time_stamptime_stamp之前的非重复事件来计算字段的值,如下所示:

WITH evt AS (
  SELECT 
    time_stamp, 
    CASE WHEN 
      time_stamp - LAG(current_non_dupl_time_stamp) OVER w >= TIMEFRAME
    THEN 
      time_stamp
    ELSE
      LAG(current_non_dupl_time_stamp) OVER w
    END AS current_non_dupl_time_stamp
  FROM event
  WINDOW w AS (PARTITION BY type ORDER BY time_stamp ASC)
)
SELECT time_stamp, time_stamp != current_non_dupl_time_stamp AS duplicate
Run Code Online (Sandbox Code Playgroud)

但这不起作用,因为计算的字段不能在以下内容中引用LAG:

ERROR:  column "current_non_dupl_time_stamp" does not exist.
Run Code Online (Sandbox Code Playgroud)

所以问题:我可以重写这个查询以达到我需要的效果吗?

kli*_*lin 1

递归方法的替代方法是自定义聚合。一旦掌握了编写自己的聚合的技术,创建转换和最终函数就变得简单且符合逻辑。

状态转换函数:

create or replace function is_duplicate(st int[], time_stamp int, timeframe int)
returns int[] language plpgsql as $$
begin
    if st is null or st[1] + timeframe <= time_stamp
    then 
        st[1] := time_stamp;
    end if;
    st[2] := time_stamp;
    return st;
end $$;
Run Code Online (Sandbox Code Playgroud)

最终功能:

create or replace function is_duplicate_final(st int[])
returns boolean language sql as $$
    select st[1] <> st[2];
$$;
Run Code Online (Sandbox Code Playgroud)

总计的:

create aggregate is_duplicate_agg(time_stamp int, timeframe int)
(
    sfunc = is_duplicate,
    stype = int[],
    finalfunc = is_duplicate_final
);
Run Code Online (Sandbox Code Playgroud)

询问:

select *, is_duplicate_agg(time_stamp, 10) over w
from event
window w as (partition by type order by time_stamp asc)
order by type, time_stamp;

 id | type | time_stamp | is_duplicate_agg 
----+------+------------+------------------
  1 |    1 |          1 | f
  2 |    1 |          2 | t
  4 |    1 |          3 | t
  5 |    1 |         10 | t
  7 |    1 |         15 | f
  8 |    1 |         21 | t
 10 |    1 |         40 | f
  3 |    2 |          2 | f
  6 |    2 |         10 | t
  9 |    2 |         13 | f
(10 rows)   
Run Code Online (Sandbox Code Playgroud)

阅读文档:37.10。用户定义的聚合CREATE AGGREGATE。