Eri*_*low 10 sql postgresql greatest-n-per-group
我updates
在Postgres 有一张表是9.4.5像这样:
goal_id | created_at | status
1 | 2016-01-01 | green
1 | 2016-01-02 | red
2 | 2016-01-02 | amber
Run Code Online (Sandbox Code Playgroud)
和这样的goals
表:
id | company_id
1 | 1
2 | 2
Run Code Online (Sandbox Code Playgroud)
我想为每家公司创建一个图表,每周显示所有目标的状态.
我想这需要生成一系列过去8周,找到该周之前的每个目标的最新更新,然后计算找到的更新的不同状态.
到目前为止我所拥有的:
SELECT EXTRACT(year from generate_series) AS year,
EXTRACT(week from generate_series) AS week,
u.company_id,
COUNT(*) FILTER (WHERE u.status = 'green') AS green_count,
COUNT(*) FILTER (WHERE u.status = 'amber') AS amber_count,
COUNT(*) FILTER (WHERE u.status = 'red') AS red_count
FROM generate_series(NOW() - INTERVAL '2 MONTHS', NOW(), '1 week')
LEFT OUTER JOIN (
SELECT DISTINCT ON(year, week)
goals.company_id,
updates.status,
EXTRACT(week from updates.created_at) week,
EXTRACT(year from updates.created_at) AS year,
updates.created_at
FROM updates
JOIN goals ON goals.id = updates.goal_id
ORDER BY year, week, updates.created_at DESC
) u ON u.week = week AND u.year = year
GROUP BY 1,2,3
Run Code Online (Sandbox Code Playgroud)
但这有两个问题.似乎加入u
并没有像我想象的那样工作.它似乎是从内部查询返回的每一行(?)加入,并且这只选择从该周发生的最新更新.如果需要,它应该从该周之前获取最新更新.
这是一些相当复杂的SQL,我喜欢关于如何将它拉下来的一些输入.
目标表大约有1000个目标ATM,并且每周增长约100个:
Table "goals"
Column | Type | Modifiers
-----------------+-----------------------------+-----------------------------------------------------------
id | integer | not null default nextval('goals_id_seq'::regclass)
company_id | integer | not null
name | text | not null
created_at | timestamp without time zone | not null default timezone('utc'::text, now())
updated_at | timestamp without time zone | not null default timezone('utc'::text, now())
Indexes:
"goals_pkey" PRIMARY KEY, btree (id)
"entity_goals_company_id_fkey" btree (company_id)
Foreign-key constraints:
"goals_company_id_fkey" FOREIGN KEY (company_id) REFERENCES companies(id) ON DELETE RESTRICT
Run Code Online (Sandbox Code Playgroud)
该updates
表约有1000左右,每周增长约100个:
Table "updates"
Column | Type | Modifiers
------------+-----------------------------+------------------------------------------------------------------
id | integer | not null default nextval('updates_id_seq'::regclass)
status | entity.goalstatus | not null
goal_id | integer | not null
created_at | timestamp without time zone | not null default timezone('utc'::text, now())
updated_at | timestamp without time zone | not null default timezone('utc'::text, now())
Indexes:
"goal_updates_pkey" PRIMARY KEY, btree (id)
"entity_goal_updates_goal_id_fkey" btree (goal_id)
Foreign-key constraints:
"updates_goal_id_fkey" FOREIGN KEY (goal_id) REFERENCES goals(id) ON DELETE CASCADE
Schema | Name | Internal name | Size | Elements | Access privileges | Description
--------+-------------------+---------------+------+----------+-------------------+-------------
entity | entity.goalstatus | goalstatus | 4 | green +| |
| | | | amber +| |
| | | | red | |
Run Code Online (Sandbox Code Playgroud)
您需要每周一个数据项目和目标(在汇总每个公司的计数之前).这是和CROSS JOIN
之间的平原.(可能)昂贵的部分是从每个获得电流.就像@Paul已经建议的那样,连接似乎是最好的工具.不过只做它,并使用更快的技术.generate_series()
goals
state
updates
LATERAL
updates
LIMIT 1
并简化日期处理date_trunc()
.
SELECT w_start
, g.company_id
, count(*) FILTER (WHERE u.status = 'green') AS green_count
, count(*) FILTER (WHERE u.status = 'amber') AS amber_count
, count(*) FILTER (WHERE u.status = 'red') AS red_count
FROM generate_series(date_trunc('week', NOW() - interval '2 months')
, date_trunc('week', NOW())
, interval '1 week') w_start
CROSS JOIN goals g
LEFT JOIN LATERAL (
SELECT status
FROM updates
WHERE goal_id = g.id
AND created_at < w_start
ORDER BY created_at DESC
LIMIT 1
) u ON true
GROUP BY w_start, g.company_id
ORDER BY w_start, g.company_id;
Run Code Online (Sandbox Code Playgroud)
要快速实现这一目标,您需要一个多列索引:
CREATE INDEX updates_special_idx ON updates (goal_id, created_at DESC, status);
Run Code Online (Sandbox Code Playgroud)
降序created_at
是最好的,但不是绝对必要的.Postgres几乎可以快速地向后扫描索引.(但不适用于多列的反向排序顺序.)
指数列在该顺序.为什么?
第三列status
只添加到允许快速索引只扫描上updates
.相关案例:
9周的1k目标(2个月的间隔与至少9周重叠)仅需要9k索引查找仅第1行的第2个表.对于像这样的小表,性能应该不是很大的问题.但是,如果每个表中还有几千个,则顺序扫描会降低性能.
w_start
代表每周的开始.因此,计数是在一周的开始.你可以仍然提取年份和星期(或任何其他细节代表你的一周),如果你坚持:
EXTRACT(isoyear from w_start) AS year
, EXTRACT(week from w_start) AS week
Run Code Online (Sandbox Code Playgroud)
最好的ISOYEAR
,就像@Paul解释的那样.
有关:
归档时间: |
|
查看次数: |
387 次 |
最近记录: |