计算Postgresql中的累计总数

kha*_*rul 53 sql postgresql aggregate-functions

我正在使用countgroup by获得每天注册的订阅者数量:

  SELECT created_at, COUNT(email)  
    FROM subscriptions 
GROUP BY created at;
Run Code Online (Sandbox Code Playgroud)

结果:

created_at  count
-----------------
04-04-2011  100
05-04-2011   50
06-04-2011   50
07-04-2011  300
Run Code Online (Sandbox Code Playgroud)

我想每天获得累计订阅者总数.我怎么得到这个?

created_at  count
-----------------
04-04-2011  100
05-04-2011  150
06-04-2011  200
07-04-2011  500
Run Code Online (Sandbox Code Playgroud)

int*_*tgr 89

对于较大的数据集,窗口函数是执行这些类型查询的最有效方式 - 表格将只扫描一次,而不是每个日期扫描一次,就像自联接一样.它看起来也简单得多.:) PostgreSQL 8.4及以上版本支持窗口功能.

这就是它的样子:

SELECT created_at, sum(count(email)) OVER (ORDER BY created_at)
FROM subscriptions
GROUP BY created_at;
Run Code Online (Sandbox Code Playgroud)

这里OVER创建了窗口; ORDER BY created_at意味着它必须按created_at顺序总结计数.


编辑:如果您想在一天内删除重复的电子邮件,则可以使用sum(count(distinct email)).不幸的是,这不会删除跨越不同日期的重复项.

如果你想删除所有重复项,我认为最简单的方法是使用子查询和DISTINCT ON.这会将电子邮件归因于他们最早的日期(因为我按升序排序created_at,它会选择最早的日期):

SELECT created_at, sum(count(email)) OVER (ORDER BY created_at)
FROM (
    SELECT DISTINCT ON (email) created_at, email
    FROM subscriptions ORDER BY email, created_at
) AS subq
GROUP BY created_at;
Run Code Online (Sandbox Code Playgroud)

如果您创建索引(email, created_at),则此查询也不应太慢.


(如果要测试,这就是我创建样本数据集的方式)

create table subscriptions as
   select date '2000-04-04' + (i/10000)::int as created_at,
          'foofoobar@foobar.com' || (i%700000)::text as email
   from generate_series(1,1000000) i;
create index on subscriptions (email, created_at);
Run Code Online (Sandbox Code Playgroud)

  • 注意,`DISTINCT ON`也可以变成带有`GROUP BY`的等价查询; 在这种情况下,`SELECT email,MIN(created_at)as created_at FROM subscriptions GROUP BY email`.哪个更有效可能会有所不同,尽管来自`DISTINCT ON'的预先排序的子查询似乎给Window函数所需的排序带来了一些好处. (2认同)

OMG*_*ies 7

使用:

SELECT a.created_at,
       (SELECT COUNT(b.email)
          FROM SUBSCRIPTIONS b
         WHERE b.created_at <= a.created_at) AS count
  FROM SUBSCRIPTIONS a
Run Code Online (Sandbox Code Playgroud)