如何在窗口函数中使用环形数据结构

Mik*_*e T 7 sql postgresql circular-buffer circular-list window-functions

我有数据以环形结构(或循环缓冲区)排列,也就是说它可以表示为循环的序列:...- 1-2-3-4-5-1-2-3 -... .看到这张图片,了解一个5部分的戒指:

在此输入图像描述

我想创建一个窗口查询,可以将滞后和铅项目组合成一个三点数组,但我无法弄清楚.例如,在5部分环的第1部分,滞后/超前序列是5-1-2,或者部分4是3-4-5.

这是一个两个环的示例表,它们具有不同数量的部件(每个环总是多于三个):

create table rp (ring int, part int);
insert into rp(ring, part) values(1, generate_series(1, 5));
insert into rp(ring, part) values(2, generate_series(1, 7));
Run Code Online (Sandbox Code Playgroud)

这是一个几乎成功的查询:

SELECT ring, part, array[
    lag(part, 1, NULL) over (partition by ring),
    part,
    lead(part, 1, 1) over (partition by ring)
    ] AS neighbours
FROM rp;

 ring | part | neighbours
------+------+------------
    1 |    1 | {NULL,1,2}
    1 |    2 | {1,2,3}
    1 |    3 | {2,3,4}
    1 |    4 | {3,4,5}
    1 |    5 | {4,5,1}
    2 |    1 | {NULL,1,2}
    2 |    2 | {1,2,3}
    2 |    3 | {2,3,4}
    2 |    4 | {3,4,5}
    2 |    5 | {4,5,6}
    2 |    6 | {5,6,7}
    2 |    7 | {6,7,1}
(12 rows)
Run Code Online (Sandbox Code Playgroud)

我唯一需要做的就是NULL用每个环的终点替换,这是最后一个值.现在,随着laglead窗口功能,有一个last_value功能,这将是理想的.但是,这些不能嵌套:

SELECT ring, part, array[
    lag(part, 1, last_value(part) over (partition by ring)) over (partition by ring),
    part,
    lead(part, 1, 1) over (partition by ring)
    ] AS neighbours
FROM rp;
ERROR:  window function calls cannot be nested
LINE 2:     lag(part, 1, last_value(part) over (partition by ring)) ...
Run Code Online (Sandbox Code Playgroud)

更新.感谢@ Justin的建议,coalesce以避免嵌套窗口函数.此外,许多人已经指出,第一个/最后一个值需要order by在环序列上明确,这恰好是part这个例子.所以输入数据随机一点:

create table rp (ring int, part int);
insert into rp(ring, part) select 1, generate_series(1, 5) order by random();
insert into rp(ring, part) select 2, generate_series(1, 7) order by random();
Run Code Online (Sandbox Code Playgroud)

Erw*_*ter 4

  • COALESCE@Justin 提供的那样使用。
  • 使用first_value()/时,last_value()需要向窗口定义添加一个ORDER BY子句,否则顺序未定义。在示例中您很幸运,因为在创建虚拟表后,行恰好按顺序排列。
    添加后ORDER BY,默认窗口框架在当前行结束,并且您需要对调用进行特殊处理last_value()- 或恢复窗口框架中的排序顺序,如我的第一个示例中所示。

  • 当多次重用窗口定义时,显式WINDOW子句可以大大简化语法:

SELECT ring, part, ARRAY[
          coalesce(
             lag(part) OVER w
            ,first_value(part) OVER (PARTITION BY ring ORDER BY part DESC))
         ,part
         ,coalesce(
             lead(part) OVER w
            ,first_value(part) OVER w)
         ] AS neighbours
FROM   rp
WINDOW w AS (PARTITION BY ring ORDER BY part);
Run Code Online (Sandbox Code Playgroud)

更好的是,重复使用相同的窗口定义,以便 Postgres 可以在一次扫描中计算所有值。为此,我们需要定义一个自定义窗口框架

SELECT ring, part, ARRAY[
          coalesce(
             lag(part) OVER w
            ,last_value(part) OVER w)
         ,part
         ,coalesce(
             lead(part) OVER w
            ,first_value(part) OVER w)
         ] AS neighbours
FROM   rp
WINDOW w AS (PARTITION BY ring
             ORDER BY part
             RANGE BETWEEN UNBOUNDED PRECEDING
                       AND UNBOUNDED FOLLOWING)
ORDER  BY 1,2;
Run Code Online (Sandbox Code Playgroud)

您甚至可以为每个窗口函数调用调整框架定义:

SELECT ring, part, ARRAY[
          coalesce(
             lag(part) OVER w
            ,last_value(part) OVER (w RANGE BETWEEN CURRENT ROW
                                                AND UNBOUNDED FOLLOWING))
         ,part
         ,coalesce(
             lead(part) OVER w
            ,first_value(part) OVER w)
         ] AS neighbours
FROM   rp
WINDOW w AS (PARTITION BY ring ORDER BY part)
ORDER  BY 1,2;
Run Code Online (Sandbox Code Playgroud)

对于有很多零件的环来说可能会更快。你必须进行测试。

SQL Fiddle通过改进的测试用例演示了这三者。考虑查询计划。

有关窗框定义的更多信息: