在 PostgreSQL 中使用 Order By 子句进行分区

Question

在 PostgreSQL 中使用 Order By 子句进行分区

我有一个包含这些值的表；

user_id ts                  val
uid1    19.05.2019 01:49:50  0
uid1    19.05.2019 01:50:15  0
uid1    19.05.2019 01:50:20  0
uid1    19.05.2019 01:59:50  1
uid1    19.05.2019 02:20:10  1
uid1    19.05.2019 02:20:15  0
uid1    19.05.2019 02:20:19  0
uid1    19.05.2019 02:30:53  1
uid1    19.05.2019 11:10:25  1
uid1    19.05.2019 11:13:40  0
uid1    19.05.2019 11:13:50  0
uid1    19.05.2019 11:20:19  1
uid2    19.05.2019 15:01:44  0
uid2    19.05.2019 15:05:55  0
uid2    19.05.2019 17:19:35  1
uid2    19.05.2019 17:20:01  0
uid2    19.05.2019 17:20:35  0
uid2    19.05.2019 19:15:50  1

Run Code Online (Sandbox Code Playgroud)

当我只用 partition by 子句查询这个表时，结果看起来像这样；

询问： select *, sum(val) over (partition by user_id) as res from example_table;

user_id ts                  val res
uid1    19.05.2019 01:49:50  0  5
uid1    19.05.2019 01:50:15  0  5
uid1    19.05.2019 01:50:20  0  5
uid1    19.05.2019 01:59:50  1  5
uid1    19.05.2019 02:20:10  1  5
uid1    19.05.2019 02:20:15  0  5
uid1    19.05.2019 02:20:19  0  5
uid1    19.05.2019 02:30:53  1  5
uid1    19.05.2019 11:10:25  1  5
uid1    19.05.2019 11:13:40  0  5
uid1    19.05.2019 11:13:50  0  5
uid1    19.05.2019 11:20:19  1  5
uid2    19.05.2019 15:01:44  0  2
uid2    19.05.2019 15:05:55  0  2
uid2    19.05.2019 17:19:35  1  2
uid2    19.05.2019 17:20:01  0  2
uid2    19.05.2019 17:20:35  0  2
uid2    19.05.2019 19:15:50  1  2

Run Code Online (Sandbox Code Playgroud)

在上面的结果中，res列具有每个分区的val列的总和值。但是，如果我用 partition by 和 order by 查询表，我会得到这些结果；

询问： select *, sum(val) over (partition by user_id order by ts) as res from example_table;

user_id ts                  val res
uid1    19.05.2019 01:49:50  0  0
uid1    19.05.2019 01:50:15  0  0
uid1    19.05.2019 01:50:20  0  0
uid1    19.05.2019 01:59:50  1  1
uid1    19.05.2019 02:20:10  1  2
uid1    19.05.2019 02:20:15  0  2
uid1    19.05.2019 02:20:19  0  2
uid1    19.05.2019 02:30:53  1  3
uid1    19.05.2019 11:10:25  1  4
uid1    19.05.2019 11:13:40  0  4
uid1    19.05.2019 11:13:50  0  4
uid1    19.05.2019 11:20:19  1  5
uid2    19.05.2019 15:01:44  0  0
uid2    19.05.2019 15:05:55  0  0
uid2    19.05.2019 17:19:35  1  1
uid2    19.05.2019 17:20:01  0  1
uid2    19.05.2019 17:20:35  0  1
uid2    19.05.2019 19:15:50  1  2

Run Code Online (Sandbox Code Playgroud)

但是对于 order by 子句，res列具有每个分区的每一行的value列的累积总和。

为什么？我无法理解这一点。

Answer 1

Pau*_*gel 5

更新

此行为记录在此处：

4.2.8. 窗口函数调用

[..] 默认的成帧选项是RANGE UNBOUNDED PRECEDING，与RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. 使用 ORDER BY，这会将框架设置为从分区启动到当前行的最后一个ORDER BY对等方的所有行。没有 ORDER BY，这意味着分区的所有行都包含在窗口框架中，因为所有行都成为当前行的对等方。

这意味着：

在没有frame_clauseRANGE UNBOUNDED PRECEDING的情况下-默认使用。包括了：

根据ORDER BY子句“在”当前行之前的所有行
当前行
在ORDER BY列中与当前行具有相同值的所有行

在没有ORDER BY条款的情况下-ORDER BY NULL是假设的（尽管我又在猜测）。因此，框架将包括分区中的所有行，因为ORDER BY列中的值NULL在每一行中都是相同的（始终是）。

原答案：

免责声明：以下内容更多是猜测而不是合格的答案。我没有找到任何可以证实我所写内容的文档。同时，我认为目前给出的答案并不能正确解释这种行为。

结果差异的原因不直接在于 ORDER BY 子句，因为a + b + c与c + b + a. 原因是（这是我的猜测） ORDER BY 子句隐式地将frame_clause定义为

rows between unbounded preceding and current row

Run Code Online (Sandbox Code Playgroud)

尝试以下查询：

select *
, sum(val) over (partition by user_id) as res
, sum(val) over (partition by user_id order by ts) as res_order_by
, sum(val) over (
    partition by user_id
    order by ts
    rows between unbounded preceding and current row
  ) as res_order_by_unbounded_preceding
, sum(val) over (
    partition by user_id
    -- order by ts
    rows between unbounded preceding and current row
  ) as res_preceding
, sum(val) over (
    partition by user_id
    -- order by ts
    rows between current row and unbounded following
  ) as res_following
, sum(val) over (
    partition by user_id
    order by ts
    rows between unbounded preceding and unbounded following
  ) as res_orderby_preceding_following

from example_table;

Run Code Online (Sandbox Code Playgroud)

数据库<>小提琴

您将看到，您可以在没有 ORDER BY 子句的情况下获得累积总和，也可以使用 ORDER BY 子句获得“完整”总和。

归档时间：	6 年，2 月前
查看次数：	3819 次
最近记录：	6 年，2 月前