按年份范围分组

Question

按年份范围分组

我有一个大表（约 900 万行），想将包含年份的字段上的行分组。到目前为止，这很容易：

// greatly simplified:
SELECT count(*), year FROM dataset GROUP BY year ORDER BY 2;

Run Code Online (Sandbox Code Playgroud)

我们定义了一些跨越多年的不规则时间段：

<1945, 1946-1964, 1965-1974, 1975-1991, 1992-2005 and >2005

Run Code Online (Sandbox Code Playgroud)

我不知道如何在 group by 子句中对这些结果进行分组。我可以为每个时间段创建子查询。

SELECT
  ( SELECT count(*) FROM dataset WHERE year <= 1945 AND ...... ) AS pre1945,
  ( ....) AS period2,
  ....
FROM dataset

Run Code Online (Sandbox Code Playgroud)

但这感觉不对，我想知道是否可以让 Postgresql 做到这一点。特别是因为该查询是对实际查询的极大简化：它有多个条件，其中包括一个跨越四个表的 ST_within 子句。因此，选择子查询方法会导致查询变得臃肿。

有没有更好的方法来创建这个结果？

Answer 1

a_h*_*ame 8

使用条件计数：

select count(case when year <= 1945 then 1 end) as pre1945,
       count(case when year between 1946 and 1964 then 1 end) as period2,
       count(case when year between 1965 and 1974 then 1 end) as period3,
       ...
from ...
where ...;

Run Code Online (Sandbox Code Playgroud)

这是有效的，因为count()忽略空值并且该case语句返回null它测试范围之外的值（anelse null是隐式的）。

使用即将推出的 9.4 版本，您可以将其重写为

select count(*) filter (where year <= 1945) as pre1945,
       count(*) filter (where year between 1946 and 1964) as period2,
       count(*) filter (where year between 1965 and 1974) as period3,
       ...
from ...
where ...;

Run Code Online (Sandbox Code Playgroud)

Answer 2

Clo*_*ldo 8

如果您希望结果为行而不是@a_horse 的答案中的列，则在 CTE 中创建年份范围并将表加入其中

with years(year_range) as ( values
    (int4range(1900, 1945, '[]')),
    (int4range(1946, 1964, '[]')),
    (int4range(1965, 1974, '[]')),
    (int4range(1975, 1991, '[]')),
    (int4range(1992, 2005, '[]')),
    (int4range(2005, 2014, '[]'))
)
select year_range, count(*)
from
    dataset d
    left join
    years y on d.year <@ y.year_range
group by 1 
order by 1

Run Code Online (Sandbox Code Playgroud)

http://www.postgresql.org/docs/current/static/rangetypes.html

归档时间：	10 年，9 月前
查看次数：	4566 次
最近记录：	10 年，9 月前