目前我有下表:
CREATE TABLE demo (
id SERIAL PRIMARY KEY,
key TEXT NOT NULL,
other_key TEXT NOT NULL,
quantity BIGINT NOT NULL,
date TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT now()
);
Run Code Online (Sandbox Code Playgroud)
现在我想按这样的查询进行分组:
SELECT other_key, SUM(quantity) FROM demo GROUP BY other_key;
Run Code Online (Sandbox Code Playgroud)
到目前为止,效果很好,但是现在我想按键过滤并打印date表的最新信息,有什么好的方法吗?
伪(将失败,因为密钥不在分组依据中)
SELECT other_key, SUM(quantity), MAX(date) FROM demo GROUP BY other_key WHERE key = ?;
Run Code Online (Sandbox Code Playgroud)
我最初的想法是子查询:
SELECT other_key, SUM(quantity), MAX(date) FROM (SELECT * FROM demo WHERE key = ?) GROUP BY other_key;
Run Code Online (Sandbox Code Playgroud)
有更好的方法吗?那么什么是该表的良好索引呢?
我当前的索引是:
CREATE INDEX demo_all_idx ON demo (key, other_key, quantity);
Run Code Online (Sandbox Code Playgroud)
奖金(已编辑):
有没有办法创建一个聚合函数来获取数量大于零的最早日期?即某种库存事件商店,其中最新日期不应该是最新条目,而应该是数量未减去/零的最新条目?就像考虑下表:
id | key | other_key | quantity | date
---+---------+-----------+----------+-----------------------
6 | 0A19882 | 01/01 | 100 | 2016-08-30 00:00:00+02
7 | 0A19882 | 01/02 | -50 | 2016-09-01 00:00:00+02
8 | 0A19882 | 01/01 | 100 | 2016-09-02 00:00:00+02
9 | 0A19882 | 01/02 | 100 | 2016-08-31 00:00:00+02
11 | 0A19882 | 01/03 | 100 | 2016-08-31 00:00:00+02
12 | 0A19882 | 01/03 | -100 | 2016-09-02 00:00:00+02
13 | 0A19882 | 01/03 | 100 | 2016-09-04 00:00:00+02
Run Code Online (Sandbox Code Playgroud)01/01 的日期应该是2016-08-30 00:00:00+0201/03,2016-09-04 00:00:00+02因为 id 12 的事件已达到零。
该WHERE子句位于GROUP BY:
SELECT
other_key,
SUM(quantity) AS sum_quantity,
MAX(date) AS max_date
FROM demo
WHERE key = ?
GROUP BY other_key
ORDER BY max_date ;
Run Code Online (Sandbox Code Playgroud)
顺便说一句,key这是 SQL 中的保留关键字 - 尽管 Postgres 中没有。最好避免作为列名或表名。
对于附加问题,要计算累积总和(按日期排序),然后找到这些总和变为正值并保持正值的(最旧的)日期,使用一些窗口函数可以更轻松地完成:
SELECT
other_key,
total_sum_quantity, max_date,
CASE WHEN cumulative_sum > 0 THEN cumulative_sum END AS cumulative_sum,
CASE WHEN cumulative_sum > 0 THEN date END AS oldest_positive_strike_date
FROM
( SELECT
*,
ROW_NUMBER()
OVER (PARTITION BY other_key
ORDER BY date DESC) AS rn
FROM
( SELECT
other_key, quantity, date,
SUM(quantity) OVER (PARTITION BY key, other_key) AS total_sum_quantity,
MAX(date) OVER (PARTITION BY key, other_key) AS max_date,
SUM(quantity) OVER (PARTITION BY key, other_key
ORDER BY date) AS cumulative_sum,
LAG(quantity) OVER (PARTITION BY key, other_key
ORDER BY date) AS prev_quantity
FROM demo
WHERE key = '0A19882'
) AS t
WHERE (cumulative_sum > 0 AND cumulative_sum-quantity <= 0)
OR (cumulative_sum <= 0 AND cumulative_sum-quantity > 0)
OR (prev_quantity IS NULL)
) AS t2
WHERE rn = 1 ;
Run Code Online (Sandbox Code Playgroud)
在rextester.com进行测试。
一些注意事项:
cumulative_sum的是 处的累计和oldest_positive_strike_date。如果总累计和不为正,这两列都将显示NULL。PARTITION BY key, other_key替换为PARTITION BY other_key. 我保持原样,以防万一您不仅需要使用一个key值运行查询,还需要使用更多值运行查询,例如。对于整个表或与WHERE key IN (...).ORDER BY date如果(key, other_key, date)具有UNIQUE约束/索引,则是确定性的。如果您有可能有两行具有相同的键、other_key 和日期,请将其替换为可以标识行的内容,例如。ORDER BY date, id。(key, other_key, date, quantity). 不过,Postgres 可能会选择不同的计划,扫描表或使用索引并根据表检查值。这取决于多种因素。尝试不同的桌子尺寸和您期望的工作负载。由于初始WHERE key = ?条件将行限制为大约 100 行(来自 100K 表),因此使用首先获取这些行的 CTE 可能会更有效,使用如下所示的内容。您可以通过简单的索引来(key)获得良好的性能:
WITH a AS
( SELECT *
FROM demo
WHERE key = ?
)
SELECT ... ; --- the query as it is, without the `WHERE`
Run Code Online (Sandbox Code Playgroud)| 归档时间: |
|
| 查看次数: |
15573 次 |
| 最近记录: |