postgresql GROUP BY 和最新日期按非 group by 语句过滤?

Chr*_*itt 1 postgresql

目前我有下表:

CREATE TABLE demo (
  id SERIAL PRIMARY KEY,
  key TEXT NOT NULL,
  other_key TEXT NOT NULL,
  quantity BIGINT NOT NULL,
  date TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT now()
);
Run Code Online (Sandbox Code Playgroud)

现在我想按这样的查询进行分组:

SELECT other_key, SUM(quantity) FROM demo GROUP BY other_key;       
Run Code Online (Sandbox Code Playgroud)

到目前为止,效果很好,但是现在我想按键过滤并打印date表的最新信息,有什么好的方法吗?

伪(将失败,因为密钥不在分组依据中)

SELECT other_key, SUM(quantity), MAX(date) FROM demo GROUP BY other_key WHERE key = ?;    
Run Code Online (Sandbox Code Playgroud)

我最初的想法是子查询:

SELECT other_key, SUM(quantity), MAX(date) FROM (SELECT * FROM demo WHERE key = ?) GROUP BY other_key;
Run Code Online (Sandbox Code Playgroud)

有更好的方法吗?那么什么是该表的良好索引呢?

我当前的索引是:

CREATE INDEX demo_all_idx ON demo (key, other_key, quantity);
Run Code Online (Sandbox Code Playgroud)

奖金(已编辑):

  • 那么有什么办法可以按 MAX(日期) 排序吗?
  • 有没有办法创建一个聚合函数来获取数量大于零的最早日期?即某种库存事件商店,其中最新日期不应该是最新条目,而应该是数量未减去/零的最新条目?就像考虑下表:

    id | key     | other_key | quantity | date
    ---+---------+-----------+----------+-----------------------
     6 | 0A19882 | 01/01     |      100 | 2016-08-30 00:00:00+02
     7 | 0A19882 | 01/02     |      -50 | 2016-09-01 00:00:00+02
     8 | 0A19882 | 01/01     |      100 | 2016-09-02 00:00:00+02
     9 | 0A19882 | 01/02     |      100 | 2016-08-31 00:00:00+02
    11 | 0A19882 | 01/03     |      100 | 2016-08-31 00:00:00+02
    12 | 0A19882 | 01/03     |     -100 | 2016-09-02 00:00:00+02
    13 | 0A19882 | 01/03     |      100 | 2016-09-04 00:00:00+02
    
    Run Code Online (Sandbox Code Playgroud)

01/01 的日期应该是2016-08-30 00:00:00+0201/03,2016-09-04 00:00:00+02因为 id 12 的事件已达到零。

ype*_*eᵀᴹ 5

WHERE子句位于GROUP BY

SELECT 
    other_key, 
    SUM(quantity) AS sum_quantity,
    MAX(date)     AS max_date 
FROM demo 
WHERE key = ?
GROUP BY other_key 
ORDER BY max_date ; 
Run Code Online (Sandbox Code Playgroud)

顺便说一句,key这是 SQL 中的保留关键字 - 尽管 Postgres 中没有。最好避免作为列名或表名。

对于附加问题,要计算累积总和(按日期排序),然后找到这些总和变为正值并保持正值的(最旧的)日期,使用一些窗口函数可以更轻松地完成:

SELECT 
    other_key,  
    total_sum_quantity, max_date,
    CASE WHEN cumulative_sum > 0 THEN cumulative_sum END AS cumulative_sum,
    CASE WHEN cumulative_sum > 0 THEN date END AS oldest_positive_strike_date
FROM
  ( SELECT 
        *,
        ROW_NUMBER()
            OVER (PARTITION BY other_key
                  ORDER BY date DESC)  AS rn   
    FROM 
      ( SELECT 
            other_key, quantity, date,
            SUM(quantity) OVER (PARTITION BY key, other_key) AS total_sum_quantity,
            MAX(date) OVER (PARTITION BY key, other_key)     AS max_date,
            SUM(quantity) OVER (PARTITION BY key, other_key
                                ORDER BY date)               AS cumulative_sum,
            LAG(quantity) OVER (PARTITION BY key, other_key
                                ORDER BY date)               AS prev_quantity
        FROM demo 
        WHERE key = '0A19882'
      ) AS t
    WHERE (cumulative_sum  > 0 AND cumulative_sum-quantity <= 0)
       OR (cumulative_sum <= 0 AND cumulative_sum-quantity  > 0)
       OR (prev_quantity IS NULL)
  ) AS t2
WHERE rn = 1 ;
Run Code Online (Sandbox Code Playgroud)

rextester.com进行测试。

一些注意事项:

  • 返回cumulative_sum的是 处的累计和oldest_positive_strike_date。如果总累计和不为正,这两列都将显示NULL
  • 可以PARTITION BY key, other_key替换为PARTITION BY other_key. 我保持原样,以防万一您不仅需要使用一个key值运行查询,还需要使用更多值运行查询,例如。对于整个表或与WHERE key IN (...).
  • ORDER BY date如果(key, other_key, date)具有UNIQUE约束/索引,则是确定性的。如果您有可能有两行具有相同的键、other_key 和日期,请将其替换为可以标识行的内容,例如。ORDER BY date, id
  • 有利于查询的“明显”索引将位于(key, other_key, date, quantity). 不过,Postgres 可能会选择不同的计划,扫描表或使用索引并根据表检查值。这取决于多种因素。尝试不同的桌子尺寸和您期望的工作负载。
  • 由于初始WHERE key = ?条件将行限制为大约 100 行(来自 100K 表),因此使用首先获取这些行的 CTE 可能会更有效,使用如下所示的内容。您可以通过简单的索引来(key)获得良好的性能:

    WITH a AS
      ( SELECT * 
        FROM demo
        WHERE key = ?
      ) 
    SELECT ... ;          --- the query as it is, without the `WHERE`
    
    Run Code Online (Sandbox Code Playgroud)