tes*_*big 3 sql postgresql sql-order-by
我有一个包含这些值的表;
user_id ts val
uid1 19.05.2019 01:49:50 0
uid1 19.05.2019 01:50:15 0
uid1 19.05.2019 01:50:20 0
uid1 19.05.2019 01:59:50 1
uid1 19.05.2019 02:20:10 1
uid1 19.05.2019 02:20:15 0
uid1 19.05.2019 02:20:19 0
uid1 19.05.2019 02:30:53 1
uid1 19.05.2019 11:10:25 1
uid1 19.05.2019 11:13:40 0
uid1 19.05.2019 11:13:50 0
uid1 19.05.2019 11:20:19 1
uid2 19.05.2019 15:01:44 0
uid2 19.05.2019 15:05:55 0
uid2 19.05.2019 17:19:35 1
uid2 19.05.2019 17:20:01 0
uid2 19.05.2019 17:20:35 0
uid2 19.05.2019 19:15:50 1
Run Code Online (Sandbox Code Playgroud)
当我只用 partition by 子句查询这个表时,结果看起来像这样;
询问 : select *, sum(val) over (partition by user_id) as res from example_table;
user_id ts val res
uid1 19.05.2019 01:49:50 0 5
uid1 19.05.2019 01:50:15 0 5
uid1 19.05.2019 01:50:20 0 5
uid1 19.05.2019 01:59:50 1 5
uid1 19.05.2019 02:20:10 1 5
uid1 19.05.2019 02:20:15 0 5
uid1 19.05.2019 02:20:19 0 5
uid1 19.05.2019 02:30:53 1 5
uid1 19.05.2019 11:10:25 1 5
uid1 19.05.2019 11:13:40 0 5
uid1 19.05.2019 11:13:50 0 5
uid1 19.05.2019 11:20:19 1 5
uid2 19.05.2019 15:01:44 0 2
uid2 19.05.2019 15:05:55 0 2
uid2 19.05.2019 17:19:35 1 2
uid2 19.05.2019 17:20:01 0 2
uid2 19.05.2019 17:20:35 0 2
uid2 19.05.2019 19:15:50 1 2
Run Code Online (Sandbox Code Playgroud)
在上面的结果中,res列具有每个分区的val列的总和值。但是,如果我用 partition by 和 order by 查询表,我会得到这些结果;
询问: select *, sum(val) over (partition by user_id order by ts) as res from example_table;
user_id ts val res
uid1 19.05.2019 01:49:50 0 0
uid1 19.05.2019 01:50:15 0 0
uid1 19.05.2019 01:50:20 0 0
uid1 19.05.2019 01:59:50 1 1
uid1 19.05.2019 02:20:10 1 2
uid1 19.05.2019 02:20:15 0 2
uid1 19.05.2019 02:20:19 0 2
uid1 19.05.2019 02:30:53 1 3
uid1 19.05.2019 11:10:25 1 4
uid1 19.05.2019 11:13:40 0 4
uid1 19.05.2019 11:13:50 0 4
uid1 19.05.2019 11:20:19 1 5
uid2 19.05.2019 15:01:44 0 0
uid2 19.05.2019 15:05:55 0 0
uid2 19.05.2019 17:19:35 1 1
uid2 19.05.2019 17:20:01 0 1
uid2 19.05.2019 17:20:35 0 1
uid2 19.05.2019 19:15:50 1 2
Run Code Online (Sandbox Code Playgroud)
但是对于 order by 子句,res列具有每个分区的每一行的value列的累积总和。
为什么?我无法理解这一点。
此行为记录在此处:
4.2.8. 窗口函数调用
[..] 默认的成帧选项是
RANGE UNBOUNDED PRECEDING,与RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. 使用ORDER BY,这会将框架设置为从分区启动到当前行的最后一个ORDER BY对等方的所有行。没有ORDER BY,这意味着分区的所有行都包含在窗口框架中,因为所有行都成为当前行的对等方。
这意味着:
在没有frame_clauseRANGE UNBOUNDED PRECEDING的情况下-默认使用。包括了:
ORDER BY子句“在”当前行之前的所有行ORDER BY列中与当前行具有相同值的所有行在没有ORDER BY条款的情况下-ORDER BY NULL是假设的(尽管我又在猜测)。因此,框架将包括分区中的所有行,因为ORDER BY列中的值NULL在每一行中都是相同的(始终是)。
免责声明:以下内容更多是猜测而不是合格的答案。我没有找到任何可以证实我所写内容的文档。同时,我认为目前给出的答案并不能正确解释这种行为。
结果差异的原因不直接在于 ORDER BY 子句,因为a + b + c与c + b + a. 原因是(这是我的猜测) ORDER BY 子句隐式地将frame_clause定义为
rows between unbounded preceding and current row
Run Code Online (Sandbox Code Playgroud)
尝试以下查询:
select *
, sum(val) over (partition by user_id) as res
, sum(val) over (partition by user_id order by ts) as res_order_by
, sum(val) over (
partition by user_id
order by ts
rows between unbounded preceding and current row
) as res_order_by_unbounded_preceding
, sum(val) over (
partition by user_id
-- order by ts
rows between unbounded preceding and current row
) as res_preceding
, sum(val) over (
partition by user_id
-- order by ts
rows between current row and unbounded following
) as res_following
, sum(val) over (
partition by user_id
order by ts
rows between unbounded preceding and unbounded following
) as res_orderby_preceding_following
from example_table;
Run Code Online (Sandbox Code Playgroud)
您将看到,您可以在没有 ORDER BY 子句的情况下获得累积总和,也可以使用 ORDER BY 子句获得“完整”总和。