如何在连续几天的"连胜"中向行添加运行计数

Question

如何在连续几天的"连胜"中向行添加运行计数

Ben*_*Ben 5 sql postgresql date-arithmetic window-functions gaps-and-islands

感谢Mike提出添加create/insert语句的建议.

create table test (
  pid integer not null,
  date date not null,
  primary key (pid, date)
);

insert into test values
  (1,'2014-10-1')
, (1,'2014-10-2')
, (1,'2014-10-3')
, (1,'2014-10-5')
, (1,'2014-10-7')
, (2,'2014-10-1')
, (2,'2014-10-2')
, (2,'2014-10-3')
, (2,'2014-10-5')
, (2,'2014-10-7');

Run Code Online (Sandbox Code Playgroud)

我想添加一个新列,即"当前条纹天数",因此结果如下所示:

pid    | date      | in_streak
-------|-----------|----------
1      | 2014-10-1 | 1
1      | 2014-10-2 | 2
1      | 2014-10-3 | 3
1      | 2014-10-5 | 1
1      | 2014-10-7 | 1
2      | 2014-10-2 | 1
2      | 2014-10-3 | 2
2      | 2014-10-4 | 3
2      | 2014-10-6 | 1

Run Code Online (Sandbox Code Playgroud)

我一直在尝试使用答案

但我无法弄清楚如何使用dense_rank()其他窗口函数的技巧来获得正确的结果.

Answer 1

Erw*_*ter 10

在此表的基础上构建(不使用SQL关键字"date"作为列名.):

CREATE TABLE tbl(
  pid int
, the_date date
, PRIMARY KEY (pid, the_date)
);

Run Code Online (Sandbox Code Playgroud)

查询:

SELECT pid, the_date
     , row_number() OVER (PARTITION BY pid, grp ORDER BY the_date) AS in_streak
FROM  (
   SELECT *
        , the_date - '2000-01-01'::date
        - row_number() OVER (PARTITION BY pid ORDER BY the_date) AS grp
   FROM   tbl
) sub
ORDER  BY pid, the_date;

Run Code Online (Sandbox Code Playgroud)

date从另一个中减去a date得到一个integer.由于您正在寻找连续几天,因此每一行都会增加一个.如果我们row_number()从中减去,整个条纹最终在同一组(grp)中pid.然后很容易处理每组的数量.

grp用两次减法计算,这应该是最快的.同样快速的替代方案可能是:

the_date - row_number() OVER (PARTITION BY pid ORDER BY the_date) * interval '1d' AS grp

Run Code Online (Sandbox Code Playgroud)

一次乘法,一次减法.字符串连接和转换更昂贵.测试用EXPLAIN ANALYZE.

不要忘记pid在两个步骤中另外进行分区,否则您将无意中混合应该分离的组.

使用子查询,因为这通常比CTE快.这里没有什么是普通的子查询无法做到的.

既然你提到它:dense_rank()显然没有必要在这里.基本row_number()完成工作.

归档时间：	10 年，9 月前
查看次数：	2159 次
最近记录：	6 年，6 月前