Wil*_*son 4 postgresql oracle aggregate group-by greatest-n-per-group
Oracle SQL 中有一项技术可用于简化聚合查询:
聚合特定列,但使用 SELECT 列表中的简单计算列从不同列获取信息。
--Oracle
--For a given country, what city has the highest population? (where the country has more than one city)
--Include the city name as a column.
select
country,
count(*),
max(population),
any_value(city) keep (dense_rank first order by population desc) --<<--
from
cities
group by
country
having
count(*) > 1
Run Code Online (Sandbox Code Playgroud)
如上所示,以下列可以带入城市名称,即使城市名称不在 GROUP BY 中:
any_value(city) keep (dense_rank first order by population desc)
Run Code Online (Sandbox Code Playgroud)
有多种方法可以使用 SQL 来实现此类操作。我正在 PostgreSQL 中寻找一种解决方案,让我可以在计算列中完成此操作 - 所有这些都在单个 SELECT 查询中(没有子查询、联接、WITH 等)。
问题:PostgreSQL 中是否有与 Oracle 相同的功能ANY_VALUE(...) KEEP (DENSE_RANK FIRST/LAST ORDER BY ...)?
有关的:
编辑:
我改为MAX(),ANY_VALUE()因为我认为ANY_VALUE()更容易阅读。
, city desc可以通过添加来打破关系order by,使其具有确定性:
any_value(city) keep (dense_rank first order by population desc, city desc)
Run Code Online (Sandbox Code Playgroud)
first_last_agg附加模块first_last_agg可以让这个变得简单。它可以从 apt.postgresql.org(以及其他)获得。阅读Postgres Wiki 中的说明。每个数据库安装一次:
CREATE EXTENSION first_last_agg;
Run Code Online (Sandbox Code Playgroud)
它提供了两个聚合函数:first()和last()。
大多数托管服务不提供该模块。如果您无法安装它,下一个最佳选择是自己创建聚合函数,如 Postgres Wiki 以及下面我的小提琴中所示。或者在这里:
但模块first_last_agg 的C 实现速度更快。
然后:
SELECT country
, count(*) AS ct_cities
, max(population) AS highest_population
, last(city ORDER BY population, city) AS biggest_city -- !
FROM cities
GROUP BY country
HAVING count(*) > 1;
Run Code Online (Sandbox Code Playgroud)
与...一样:
, first(city ORDER BY population DESC NULLS LAST, city DESC NULLS LAST) AS biggest_city
Run Code Online (Sandbox Code Playgroud)
为什么NULLS LAST?看:
要么报告人口最多的城市,要么按字母顺序最后排列名称 - 就像您的原始名称一样。
如果无法安装附加模块。而你却依然坚持:
所有这些都在单个 SELECT 查询中(无子查询、连接、WITH 等)。
DISTINCT ON与窗口函数结合也可以做到这一点:
SELECT DISTINCT ON (country)
country
, count(*) OVER (PARTITION BY country) AS ct_cities
, population AS highest_population
, city AS biggest_city
FROM cities c
ORDER BY country, population DESC NULLS LAST, city DESC NULLS LAST;
Run Code Online (Sandbox Code Playgroud)
看:
同时消除只有一个条目的国家:
SELECT DISTINCT ON (country)
country
, count(*) OVER (PARTITION BY country) AS ct_cities
, population AS highest_population
, city AS biggest_city
FROM cities c
WHERE EXISTS (SELECT FROM cities c1 WHERE c1.country = c.country AND c1.ctid <> c.ctid)
ORDER BY country, population DESC NULLS LAST, city DESC NULLS LAST;
Run Code Online (Sandbox Code Playgroud)
使用你的 PK,而不是ctid如果你有 PK。看:
如果允许子查询,则:
SELECT *
FROM (
SELECT DISTINCT ON (country)
country
, count(*) OVER (PARTITION BY country) AS ct_cities
, population AS highest_population
, city AS biggest_city
FROM cities c
ORDER BY country, population DESC NULLS LAST, city DESC NULLS LAST
) sub
WHERE ct_cities > 1;
Run Code Online (Sandbox Code Playgroud)
(array_agg(city ORDER BY population DESC NULLS LAST))[1]通常在每个国家/地区超过几行时表现不佳。聚合大数组,仅获取第一个元素的成本很高。查看性能基准:
| 归档时间: |
|
| 查看次数: |
1707 次 |
| 最近记录: |