Ana*_*ant 2 sql postgresql window-functions postgresql-9.0
我有一张这样的桌子.
ID (integer)
event_name(varchar(20))
event_date(timestamp)
Run Code Online (Sandbox Code Playgroud)
下面给出了一些样本数据.
ID event_date event_name
101 2013-04-24 18:33:37.694818 event_A
102 2013-04-24 20:34:37.000000 event_B
103 2013-04-24 20:40:37.000000 event_A
104 2013-04-25 01:00:00.694818 event_A
105 2013-04-25 12:00:15.694818 event_A
106 2013-04-26 00:56:10.800000 event_A
107 2013-04-27 12:00:15.694818 event_A
108 2013-04-27 12:00:15.694818 event_B
Run Code Online (Sandbox Code Playgroud)
我需要生成基于窗口的报告.这里的窗口代表一组行.例如:如果我选择窗口大小为2,我需要连续两天显示每个事件的总计数,即同一天和前一天.如果我选择窗口大小3,我需要连续三天生成每个事件的计数.
所以如果选择2天窗口,结果应如下所示.
Date Count_eventA Count_eventB
2013-04-27 (this counts sum of 27th, 26th) 2 1
2013-04-26 (this counts sum of 26th, 25th) 3 0
2013-04-25 (this counts sum of 25th, 24th) 4 1
2013-04-24 (this counts sum of 24th ) 2 1
Run Code Online (Sandbox Code Playgroud)
我在postgres中读过窗口函数.有人可以指导我如何为此报告编写SQL查询!
您希望将count聚合用作窗口函数,例如count(id) over (partition by event_date rows 3 preceeding)......但是由于数据的性质,它会变得非常复杂.您正在存储时间戳,而不仅仅是日期,并且您希望按日分组而不是按先前事件的数量分组.最重要的是,您希望对结果进行交叉制表.
如果RANGE在窗口函数中支持PostgreSQL ,那么这将比它简单得多.事实上,你必须以艰难的方式去做.
然后,您可以过滤通过一个窗口,让每个事件的每天滞后计数......除了你活动的日子是不连续的,不幸的是PostgreSQL的窗口功能只支持ROWS,不RANGE,所以你必须跨越产生的系列加盟约会的第一个.
WITH
/* First, get a listing of event counts by day */
event_days(event_name, event_day, event_day_count) AS (
SELECT event_name, date_trunc('day', event_date), count(id)
FROM Table1
GROUP BY event_name, date_trunc('day', event_date)
ORDER BY date_trunc('day', event_date), event_name
),
/*
* Then fill in zeros for any days within the range that didn't have any events.
* If PostgreSQL supported RANGE windows, not just ROWS, we could get rid of this/
*/
event_days_contiguous(event_name, event_day, event_day_count) AS (
SELECT event_names.event_name, gen_day, COALESCE(event_days.event_day_count,0)
FROM generate_series( (SELECT min(event_day)::date FROM event_days), (SELECT max(event_day)::date FROM event_days), INTERVAL '1' DAY ) gen_day
CROSS JOIN (SELECT DISTINCT event_name FROM event_days) event_names(event_name)
LEFT OUTER JOIN event_days ON (gen_day = event_days.event_day AND event_names.event_name = event_days.event_name)
),
/*
* Get the lagged counts by using the sum() function over a row window...
*/
lagged_days(event_name, event_day_first, event_day_last, event_days_count) AS (
SELECT event_name, event_day, first_value(event_day) OVER w, sum(event_day_count) OVER w
FROM event_days_contiguous
WINDOW w AS (PARTITION BY event_name ORDER BY event_day ROWS 1 PRECEDING)
)
/* Now do a manual pivot. For arbitrary column counts use an external tool
* or check out the 'crosstab' function in the 'tablefunc' contrib module
*/
SELECT d1.event_day_first, d1.event_days_count AS "Event_A", d2.event_days_count AS "Event_B"
FROM lagged_days d1
INNER JOIN lagged_days d2 ON (d1.event_day_first = d2.event_day_first AND d1.event_name = 'event_A' AND d2.event_name = 'event_B')
ORDER BY d1.event_day_first;
Run Code Online (Sandbox Code Playgroud)
输出样本数据:
event_day_first | Event_A | Event_B
------------------------+---------+---------
2013-04-24 00:00:00+08 | 2 | 1
2013-04-25 00:00:00+08 | 4 | 1
2013-04-26 00:00:00+08 | 3 | 0
2013-04-27 00:00:00+08 | 2 | 1
(4 rows)
Run Code Online (Sandbox Code Playgroud)
通过将三个CTE子句组合到一个嵌套查询中使用FROM (SELECT...)并将它们包装在视图中而不是CTE中以便从外部查询中使用,可以使查询更快但更加丑陋.这将允许Pg将谓词"下推"到查询中,从而大大减少查询数据子集时必须使用的数据.
SQLFiddle目前似乎没有工作,但这是我使用的演示设置:
CREATE TABLE Table1
(id integer primary key, "event_date" timestamp not null, "event_name" text);
INSERT INTO Table1
("id", "event_date", "event_name")
VALUES
(101, '2013-04-24 18:33:37', 'event_A'),
(102, '2013-04-24 20:34:37', 'event_B'),
(103, '2013-04-24 20:40:37', 'event_A'),
(104, '2013-04-25 01:00:00', 'event_A'),
(105, '2013-04-25 12:00:15', 'event_A'),
(106, '2013-04-26 00:56:10', 'event_A'),
(107, '2013-04-27 12:00:15', 'event_A'),
(108, '2013-04-27 12:00:15', 'event_B');
Run Code Online (Sandbox Code Playgroud)
我将最后一个条目的ID从107更改为108,因为我怀疑这只是手动编辑中的错误.
以下是如何将其表达为视图:
CREATE VIEW lagged_days AS
SELECT event_name, event_day AS event_day_first, sum(event_day_count) OVER w AS event_days_count
FROM (
SELECT event_names.event_name, gen_day, COALESCE(event_days.event_day_count,0)
FROM generate_series( (SELECT min(event_date)::date FROM Table1), (SELECT max(event_date)::date FROM Table1), INTERVAL '1' DAY ) gen_day
CROSS JOIN (SELECT DISTINCT event_name FROM Table1) event_names(event_name)
LEFT OUTER JOIN (
SELECT event_name, date_trunc('day', event_date), count(id)
FROM Table1
GROUP BY event_name, date_trunc('day', event_date)
ORDER BY date_trunc('day', event_date), event_name
) event_days(event_name, event_day, event_day_count)
ON (gen_day = event_days.event_day AND event_names.event_name = event_days.event_name)
) event_days_contiguous(event_name, event_day, event_day_count)
WINDOW w AS (PARTITION BY event_name ORDER BY event_day ROWS 1 PRECEDING);
Run Code Online (Sandbox Code Playgroud)
然后,您可以在要编写的任何交叉表查询中使用该视图.它将与之前的hand-crosstab查询一起使用:
SELECT d1.event_day_first, d1.event_days_count AS "Event_A", d2.event_days_count AS "Event_B"
FROM lagged_days d1
INNER JOIN lagged_days d2 ON (d1.event_day_first = d2.event_day_first AND d1.event_name = 'event_A' AND d2.event_name = 'event_B')
ORDER BY d1.event_day_first;
Run Code Online (Sandbox Code Playgroud)
...或使用crosstab从tablefunc扩展,我就让你上晚自习.
笑一笑,这是explain上面基于视图的查询:http://explain.depesz.com/s/nvUq
| 归档时间: |
|
| 查看次数: |
1815 次 |
| 最近记录: |