Web*_*rer 2 postgresql date-histogram
我使用 Postgres CLI 编写了一个查询,该查询在终端中返回条形图。查询速度慢且效率低。我想改变这一点。
在底层,我们有一个非常简单的查询。我们希望每一行都是表中总行数的除法。假设我们的硬编码行数是N_ROWS,我们的表是my_table。
另外,假设N_ROWS等于 8。
select
(select count(id) from my_table) / N_ROWS * (N_ROWS - num) as level
from (VALUES (0), (1), (2), (3), (4), (5), (6), (7), (8)) as t (num)
Run Code Online (Sandbox Code Playgroud)
就我而言,这将返回图表的 Y 轴:
level
-------
71760
62790
53820
44850
35880
26910
17940
8970
0
Run Code Online (Sandbox Code Playgroud)
您已经可以看到该查询的问题。
我可以使用编程方式生成多行N_ROWS而不是对每个行值进行硬编码吗VALUES?显然,我也不喜欢对整个表的每一行执行新的计数。
我们现在需要 X 轴,这就是我的想法:
select
r.level,
case
when (
select count(id) from my_table where created_at_utc<= '2019-01-01 00:00:00'::timestamp without time zone
) >= r.level then true
end as "2019-01-01"
from (
select (select count(id) from my_table) / N_ROWS * (N_ROWS - num) as level from (VALUES (0), (1), (2), (3), (4), (5), (6), (7), (8)) as t (num)
) as r;
Run Code Online (Sandbox Code Playgroud)
返回我们的第一个桶:
level | 2019-01-01
-------+------------
71760 |
62790 |
53820 |
44850 |
35880 |
26910 | t
17940 | t
8970 | t
0 | t
Run Code Online (Sandbox Code Playgroud)
我不想为每个存储桶硬编码一个 case 语句,但是,当然,这就是我所做的。结果就是我一直在寻找的。
level | 2019-01-01 | 2019-02-01 | 2019-03-01 | 2019-04-01 | 2019-05-01 | 2019-06-01 | 2019-07-01 | 2019-08-01 | 2019-09-01 | 2019-10-01 | 2019-11-01 | 2019-12-01
-------+------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+------------
71760 | | | | | | | | | | | | t
62790 | | | | | t | t | t | t | t | t | t | t
53820 | | | | t | t | t | t | t | t | t | t | t
44850 | | | t | t | t | t | t | t | t | t | t | t
35880 | | t | t | t | t | t | t | t | t | t | t | t
26910 | t | t | t | t | t | t | t | t | t | t | t | t
17940 | t | t | t | t | t | t | t | t | t | t | t | t
8970 | t | t | t | t | t | t | t | t | t | t | t | t
0 | t | t | t | t | t | t | t | t | t | t | t | t
Run Code Online (Sandbox Code Playgroud)
我们当然可以做出一些改进。
首先,让我们用一些数据制作一个测试表:
CREATE TABLE test (id bigint, dt date);
-- Add 1 million rows
insert into test select generate_series(1,100000, 1);
-- Add dates from 2019-01-01 to 2019-01-11
update test set dt='2019-01-01'::date + (id/10000)::int;
Run Code Online (Sandbox Code Playgroud)
我们几乎可以用这个更快的查询替换您的第一个查询来查找级别:
SELECT unnest(percentile_disc(
(
SELECT array_agg(x)
FROM generate_series(0, 1, (1::numeric)/8) as g(x))
) WITHIN GROUP (ORDER BY id)
) as l
FROM test;
l
--------
1
12500
25000
37500
50000
62500
75000
87500
100000
(9 rows)
Run Code Online (Sandbox Code Playgroud)
注意,第一个级别是1而不是0,但其余的应该是相同的。
我们还可以使用其他一些技巧:
WITH num_levels AS (
SELECT 8 as num_levels
), levels as (
SELECT unnest(percentile_disc(
(
SELECT array_agg(x)
FROM num_levels
CROSS JOIN LATERAL generate_series(0, 1, (1::numeric)/num_levels.num_levels) as g(x))
) WITHIN GROUP (ORDER BY id)
) as l
FROM test
), dates as (
SELECT d
FROM generate_series('2019-01-01T00:00:00'::timestamp, '2019-01-11T00:00:00'::timestamp, '1 day') as g(d)
), counts_per_day AS (
SELECT dt,
sum(counts) OVER (ORDER BY dt) as cum_sum -- the cumulative count
FROM (
SELECT dt,
count(id) as counts -- The count per day
FROM test
GROUP BY dt
) sub
)
SELECT l, dt, CASE WHEN cum_sum >= l THEN true ELSE null END
FROM levels, dates
LEFT JOIN counts_per_day ON dt = d
ORDER BY l DESC, d asc
\crosstabview
l | 2019-01-01 | 2019-01-02 | 2019-01-03 | 2019-01-04 | 2019-01-05 | 2019-01-06 | 2019-01-07 | 2019-01-08 | 2019-01-09 | 2019-01-10 | 2019-01-11
--------+------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+------------
100000 | | | | | | | | | | | t
87500 | | | | | | | | | t | t | t
75000 | | | | | | | | t | t | t | t
62500 | | | | | | | t | t | t | t | t
50000 | | | | | | t | t | t | t | t | t
37500 | | | | t | t | t | t | t | t | t | t
25000 | | | t | t | t | t | t | t | t | t | t
12500 | | t | t | t | t | t | t | t | t | t | t
1 | t | t | t | t | t | t | t | t | t | t | t
(9 rows)
Run Code Online (Sandbox Code Playgroud)
该查询在我的笔记本电脑上运行了 40 毫秒。
日期可以从测试表中的最大和最小日期中选择,并且间隔可以从 1 天开始更改,具体取决于最大和最小之间需要多少列。
| 归档时间: |
|
| 查看次数: |
2022 次 |
| 最近记录: |