按月计算并将月份作为列

Pet*_*ton 4 sql postgresql pivot case crosstab

背景

我每月都有时间序列数据,我想对每个ID进行求和,按月分组,然后将月份名称作为列而不是行.

+----+------------+-------+-------+
| id | extra_info | month | value |
+----+------------+-------+-------+
| 1  | abc        | jan   | 10    |
| 1  | abc        | feb   | 20    |
| 2  | def        | jan   | 10    |
| 2  | def        | feb   | 5     |
| 1  | abc        | jan   | 15    |
| 3  | ghi        | mar   | 15    |
Run Code Online (Sandbox Code Playgroud)

期望的结果

+----+------------+-----+-----+-----+
| id | extra_info | jan | feb | mar |
+----+------------+-----+-----+-----+
| 1  | abc        | 25  | 20  | 0   |
| 2  | def        | 10  | 5   | 0   |
| 3  | ghi        | 0   | 0   | 15  |
Run Code Online (Sandbox Code Playgroud)

目前的方法

我可以轻松地按月分组,总结价值观.这让我:

-----------------------------------
| id | extra_info | month | value |
+----+------------+-------+-------+
| 1  | abc        | jan   | 25    |
| 1  | abc        | feb   | 20    |
| 2  | def        | jan   | 10    |
| 2  | def        | feb   | 5     |
| 3  | ghi        | mar   | 15    |
Run Code Online (Sandbox Code Playgroud)

但我现在需要这几个月作为列名.不知道从哪里开始.

附加信息

  • 在语言方面,此查询将在postgres中运行.
  • 上面几个月只是示例,显然真实的数据集要大得多,涵盖了数千个ID的所有12个月

来自SQL大师的任何想法非常感谢!

Tar*_*ryn 6

您可以将带有CASE表达式的聚合函数用于将行转换为列:

select id,
  extra_info,
  sum(case when month = 'jan' then value else 0 end) jan,
  sum(case when month = 'feb' then value else 0 end) feb,
  sum(case when month = 'mar' then value else 0 end) mar,
  sum(case when month = 'apr' then value else 0 end) apr,
  sum(case when month = 'may' then value else 0 end) may,
  sum(case when month = 'jun' then value else 0 end) jun,
  sum(case when month = 'jul' then value else 0 end) jul,
  sum(case when month = 'aug' then value else 0 end) aug,
  sum(case when month = 'sep' then value else 0 end) sep,
  sum(case when month = 'oct' then value else 0 end) oct,
  sum(case when month = 'nov' then value else 0 end) nov,
  sum(case when month = 'dec' then value else 0 end) "dec"
from yt
group by id, extra_info
Run Code Online (Sandbox Code Playgroud)

请参阅SQL Fiddle with Demo


Erw*_*ter 6

设置

CREATE TABLE tbl (
  id int
, extra_info varchar(3)
, month date
, value int
);
   
INSERT INTO tbl VALUES
  (1, 'abc', '2012-01-01', 10)
, (1, 'abc', '2012-02-01', 20)
, (2, 'def', '2012-01-01', 10)
, (2, 'def', '2012-02-01',  5)
, (1, 'abc', '2012-01-01', 15)
, (3, 'ghi', '2012-03-01', 15)
;
Run Code Online (Sandbox Code Playgroud)

crosstab()

我会使用crosstab()附加tablefunc模块。每个数据库安装一次:

CREATE EXTENSION tablefunc;
Run Code Online (Sandbox Code Playgroud)

基本:

如何处理“额外”列:

高级用法:

询问

CREATE EXTENSION tablefunc;
Run Code Online (Sandbox Code Playgroud)

显然,每个只能输出一个 。我选择, 因为你没有指定。如果每个都相同,您还可以另外按它分组。extra_infoidmin(extra_info)id

结果:

SELECT * FROM crosstab(
   $$
   SELECT id, min(extra_info), month, sum(value) AS value
   FROM   tbl
   GROUP  BY id, month
   ORDER  BY id, month
   $$
 , $$
   VALUES
     ('jan'::text), ('feb'), ('mar'), ('apr'), ('may'), ('jun')
   , ('jul'),       ('aug'), ('sep'), ('oct'), ('nov'), ('dec')
   $$
   ) AS ct (id  int, extra text
          , jan int, feb int, mar int, apr int, may int, jun int
          , jul int, aug int, sep int, oct int, nov int, dec int);
Run Code Online (Sandbox Code Playgroud)

db<>在这里摆弄

安装该tablefunc模块(每个数据库一次)会产生一些开销,但查询通常更快、更短。

纯SQL

FILTER如果您不能或不愿安装附加模块,则使用Postgres 9.4 添加的聚合子句后,普通 SQL 会更快一些。看:

 id | extra | jan | feb | mar | apr | may | jun | jul | aug | sep | oct | nov | dec
----+-------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----
  1 | abc   |  25 |  20 |     |     |     |     |     |     |     |     |     |
  2 | def   |  10 |   5 |     |     |     |     |     |     |     |     |     |
  3 | ghi   |     |     |  15 |     |     |     |     |     |     |     |     |
Run Code Online (Sandbox Code Playgroud)

0代替NULL

要针对缺失值输出 0 而不是 NULL,请使用COALESCE以下任一查询:

SELECT id, min(extra_info) AS extra
     , sum(value) FILTER (WHERE month = 'jan') AS jan
     , sum(value) FILTER (WHERE month = 'feb') AS feb
     , sum(value) FILTER (WHERE month = 'mar') AS mar
     , sum(value) FILTER (WHERE month = 'apr') AS apr
     , sum(value) FILTER (WHERE month = 'may') AS may
     , sum(value) FILTER (WHERE month = 'jun') AS jun
     , sum(value) FILTER (WHERE month = 'jul') AS jul
     , sum(value) FILTER (WHERE month = 'aug') AS aug
     , sum(value) FILTER (WHERE month = 'sep') AS sep
     , sum(value) FILTER (WHERE month = 'oct') AS oct
     , sum(value) FILTER (WHERE month = 'nov') AS nov
     , sum(value) FILTER (WHERE month = 'dec') AS dec
FROM   tbl
GROUP  BY id
ORDER  BY id;
Run Code Online (Sandbox Code Playgroud)

db<>在这里摆弄