m0m*_*eni 7 sql postgresql analytics aggregate query-optimization
目前我有这个相当大的查询
count()
按事件名称和日期分组的事件,将每日、每周、每月计数聚合到中间表中。avg()
按事件分组来选择每个中间表的平均计数,对结果进行联合,并且因为我想为每天、每周、每月设置一个单独的列,将填充值 0 放入空列中。查询虽然很大,但我觉得我正在做很多重复的工作。有什么办法可以更好地执行此查询或使其更小吗?我以前没有真正做过这样的查询,所以我不太确定。
WITH monthly_counts as (
SELECT
event,
count(*) as count
FROM tracking_stuff
WHERE
event = 'thing'
OR event = 'thing2'
OR event = 'thing3'
GROUP BY event, date_trunc('month', created_at)
),
weekly_counts as (
SELECT
event,
count(*) as count
FROM tracking_stuff
WHERE
event = 'thing'
OR event = 'thing2'
OR event = 'thing3'
GROUP BY event, date_trunc('week', created_at)
),
daily_counts as (
SELECT
event,
count(*) as count
FROM tracking_stuff
WHERE
event = 'thing'
OR event = 'thing2'
OR event = 'thing3'
GROUP BY event, date_trunc('day', created_at)
),
query as (
SELECT
event,
0 as daily_avg,
0 as weekly_avg,
avg(count) as monthly_avg
FROM monthly_counts
GROUP BY event
UNION
SELECT
event,
0 as daily_avg,
avg(count) as weekly_avg,
0 as monthly_avg
FROM weekly_counts
GROUP BY event
UNION
SELECT
event,
avg(count) as daily_avg,
0 as weekly_avg,
0 as monthly_avg
FROM daily_counts
GROUP BY event
)
SELECT
event,
sum(daily_avg) as daily_avg,
sum(weekly_avg) as weekly_avg,
sum(monthly_avg) as monthly_avg
FROM query
GROUP BY event;
Run Code Online (Sandbox Code Playgroud)
我会以这样的方式编写查询:
select event, daily_avg, weekly_avg, monthly_avg
from (
select event, avg(count) monthly_avg
from (
select event, count(*)
from tracking_stuff
where event in ('thing1', 'thing2', 'thing3')
group by event, date_trunc('month', created_at)
) s
group by 1
) monthly
join (
select event, avg(count) weekly_avg
from (
select event, count(*)
from tracking_stuff
where event in ('thing1', 'thing2', 'thing3')
group by event, date_trunc('week', created_at)
) s
group by 1
) weekly using(event)
join (
select event, avg(count) daily_avg
from (
select event, count(*)
from tracking_stuff
where event in ('thing1', 'thing2', 'thing3')
group by event, date_trunc('day', created_at)
) s
group by 1
) daily using(event)
order by 1;
Run Code Online (Sandbox Code Playgroud)
如果where
条件消除了很大一部分数据(比如超过一半),则使用cte
可以稍微加快查询执行速度:
with the_data as (
select event, created_at
from tracking_stuff
where event in ('thing1', 'thing2', 'thing3')
)
select event, daily_avg, weekly_avg, monthly_avg
from (
select event, avg(count) monthly_avg
from (
select event, count(*)
from the_data
group by event, date_trunc('month', created_at)
) s
group by 1
) monthly
-- etc ...
Run Code Online (Sandbox Code Playgroud)
出于好奇,我对数据进行了测试:
create table tracking_stuff (event text, created_at timestamp);
insert into tracking_stuff
select 'thing' || random_int(9), '2016-01-01'::date+ random_int(365)
from generate_series(1, 1000000);
Run Code Online (Sandbox Code Playgroud)
在每个查询我已经更换了thing
同thing1
,所以查询消除对行的2/3。
10 个测试的平均执行时间:
Original query 1106 ms
My query without cte 1077 ms
My query with cte 902 ms
Clodoaldo's query 5187 ms
Run Code Online (Sandbox Code Playgroud)
在 9.5+ 中使用 grouping sets
FROM 和 WHERE 子句选择的数据按每个指定的分组集分别分组,像简单的 GROUP BY 子句一样为每个组计算聚合,然后返回结果
select event,
avg(total) filter (where day is not null) as avg_day,
avg(total) filter (where week is not null) as avg_week,
avg(total) filter (where month is not null) as avg_month
from (
select
event,
date_trunc('day', created_at) as day,
date_trunc('week', created_at) as week,
date_trunc('month', created_at) as month,
count(*) as total
from tracking_stuff
where event in ('thing','thing2','thing3')
group by grouping sets ((event, 2), (event, 3), (event, 4))
) s
group by event
Run Code Online (Sandbox Code Playgroud)
归档时间: |
|
查看次数: |
12158 次 |
最近记录: |