Ali*_*lin 13 postgresql greatest-n-per-group
假设这是来自 2 个表的连接的示例日期。数据库是 Postgres 9.6
id product_id invoice_id amount date
1 PROD1 INV01 2 01-01-2018
2 PROD2 INV02 3 01-01-2018
3 PROD1 INV01 2 05-01-2018
4 PROD1 INV03 1 05-01-2018
5 PROD2 INV02 3 08-01-2018
6 PROD2 INV04 4 08-01-2018
Run Code Online (Sandbox Code Playgroud)
我想知道是否有可能以优化的方式:
Run Code Online (Sandbox Code Playgroud)id product_id invoice_id amount date 3 PROD1 INV01 2 05-01-2018 4 PROD1 INV03 1 05-01-2018 5 PROD2 INV02 3 08-01-2018 6 PROD2 INV04 4 08-01-2018
这意味着:
product_id amount date
PROD1 2 01-01-2018
PROD2 3 01-01-2018
PROD1 2 02-01-2018
PROD2 3 02-01-2018
PROD1 2 03-01-2018
PROD2 3 03-01-2018
PROD1 2 04-01-2018
PROD2 3 04-01-2018
PROD1 3 05-01-2018
PROD2 3 05-01-2018
PROD1 3 06-01-2018
PROD2 3 06-01-2018
PROD1 3 07-01-2018
PROD2 3 07-01-2018
PROD1 3 08-01-2018
PROD2 7 08-01-2018
Run Code Online (Sandbox Code Playgroud)
一些想法:
对于第一个问题,我可以获得max(date)
每个 PRODx 和每个 PRODx 的选择,date=with max(date)
但我想知道如果数据库中有大量记录,是否有更快的方法来获得它
对于第二个问题,我可以生成所需的间隔一系列的日期,然后用WITH rows As
做查询通过分组product_id
和sum
按金额,然后从选择每个日期之前的值rows
有limit 1
但不健全,要么优化。
期待任何输入。谢谢你。
稍后编辑:尝试尝试 DISTINCT ON ()。
distinct on(product_id, invoice_id)
那么我不会只得到最近日期的最近的。如果过去有invoice_ids,除了最近的日期,然后它们将被返回distinct on (product_id)
那么它从最近的日期返回,但正常情况下,只有最后一行,即使在最后一天我有两个 PROD1 位置。基本上我需要类似“我需要最近的日期,所有的 product_ids 和他们的 invoice_ids,同时记住一个 product_id 可以有多个 invoice_ids”
后来编辑2:
像第一个问题一样运行查询似乎相当快:
select product_id, invoice_id, amount
from mytable inner join myOtherTable on...
inner join (select max(date) as last_date, product_id
from mytable
group by product_id) sub on mytable.date =
sub.last_date
Run Code Online (Sandbox Code Playgroud)
独立剥皮 Q#1 与 @ypercube 略有不同
with cte as (select row_number() over (partition by product_id,
invoice_id
order by dt desc) as rn,
product_id,
invoice_id,
amount,dt
from product )
select product_id, invoice_id,amount,dt
from cte
where rn=1
order by product_id,invoice_id;
product_id | invoice_id | amount | dt
------------+------------+--------+------------
PROD1 | INV01 | 2 | 2018-01-05
PROD1 | INV03 | 1 | 2018-01-05
PROD2 | INV02 | 3 | 2018-01-08
PROD2 | INV04 | 4 | 2018-01-08
(4 rows)
Run Code Online (Sandbox Code Playgroud)
对于 Q#2,你是在正确的轨道上,但 SQL 将有一个交叉连接(喘气!)
我认为带有循环/光标的函数会更优化(我会在下一个空闲时间块中尝试)
--the cte will give us the real values
with cte as (select product_id,
sum(amount) as amount,
dt
from product
group by product_id,dt)
select p.product_id,
(select cte.amount --choose the amount
from cte
where cte.product_id = p.product_id
and cte.dt <= d.gdt -- for same day or earlier
order by cte.dt desc
limit 1) as finamt,
d.gdt
from (select generate_series( (select min(dt)
from product), --where clause if some products
--don't have an amount
(select max(dt)
from product),
'1 day'
)::date as gdt) d
cross join --assuming each listed product has an amount on the min date
(select distinct product_id
from product) p
left join --since we need to fill the gaps
cte on ( d.gdt = cte.dt
and p.product_id = cte.product_id)
order by d.gdt, p.product_id
;
Run Code Online (Sandbox Code Playgroud)
我知道您希望每个产品的所有行都具有最新日期(包括关系,即具有最后日期的所有行)。这可以通过rank()
函数来完成:
select id, product_id, invoice_id, amount, date
from
( select id, product_id, invoice_id, amount, date,
rank() over (partition by product_id
order by date desc) as rnk
from
-- your joins
) as t
where rnk = 1 ;
Run Code Online (Sandbox Code Playgroud)
归档时间: |
|
查看次数: |
57929 次 |
最近记录: |