PostgreSQL 相当于 Oracle 的 ANY_VALUE(...) KEEP (DENSE_RANK FIRST/LAST ORDER BY ...)

Wil*_*son 4 postgresql oracle aggregate group-by greatest-n-per-group

Oracle SQL 中有一项技术可用于简化聚合查询:

聚合特定列,但使用 SELECT 列表中的简单计算列从不同列获取信息。

--Oracle
--For a given country, what city has the highest population? (where the country has more than one city)
--Include the city name as a column.
select
    country,
    count(*),
    max(population),
    any_value(city) keep (dense_rank first order by population desc)   --<<--
from
    cities
group by
    country
having
    count(*) > 1
Run Code Online (Sandbox Code Playgroud)

数据库<>小提琴

如上所示,以下列可以带入城市名称,即使城市名称不在 GROUP BY 中:

 any_value(city) keep (dense_rank first order by population desc)
Run Code Online (Sandbox Code Playgroud)

有多种方法可以使用 SQL 来实现此类操作。我正在 PostgreSQL 中寻找一种解决方案,让我可以在计算列中完成此操作 - 所有这些都在单个 SELECT 查询中(没有子查询、联接、WITH 等)。

问题:PostgreSQL 中是否有与 Oracle 相同的功能ANY_VALUE(...) KEEP (DENSE_RANK FIRST/LAST ORDER BY ...)


有关的:


编辑:

我改为MAX()ANY_VALUE()因为我认为ANY_VALUE()更容易阅读。

, city desc可以通过添加来打破关系order by,使其具有确定性:

any_value(city) keep (dense_rank first order by population desc, city desc)
Run Code Online (Sandbox Code Playgroud)

Erw*_*ter 7

first_last_agg

附加模块first_last_agg可以让这个变得简单。它可以从 apt.postgresql.org(以及其他)获得。阅读Postgres Wiki 中的说明。每个数据库安装一次:

CREATE EXTENSION first_last_agg;
Run Code Online (Sandbox Code Playgroud)

它提供了两个聚合函数:first()last()

大多数托管服务不提供该模块。如果您无法安装它,下一个最佳选择是自己创建聚合函数,如 Postgres Wiki 以及下面我的小提琴中所示。或者在这里:

但模块first_last_agg 的C 实现速度更快。

然后:

SELECT country
     , count(*) AS ct_cities
     , max(population) AS highest_population
     , last(city ORDER BY population, city) AS biggest_city  -- !
FROM   cities
GROUP  BY country
HAVING count(*) > 1;
Run Code Online (Sandbox Code Playgroud)

小提琴

与...一样:

 , first(city ORDER BY population DESC NULLS LAST, city DESC NULLS LAST) AS biggest_city 
Run Code Online (Sandbox Code Playgroud)

为什么NULLS LAST?看:

要么报告人口最多的城市,要么按字母顺序最后排列名称 - 就像您的原始名称一样。

无需附加模块

如果无法安装附加模块。而你却依然坚持:

所有这些都在单个 SELECT 查询中(无子查询、连接、WITH 等)。

DISTINCT ON与窗口函数结合也可以做到这一点:

SELECT DISTINCT ON (country)
       country
     , count(*) OVER (PARTITION BY country) AS ct_cities
     , population AS highest_population
     , city AS biggest_city
FROM   cities c
ORDER  BY country, population DESC NULLS LAST, city DESC NULLS LAST;
Run Code Online (Sandbox Code Playgroud)

看:

同时消除只有一个条目的国家:

SELECT DISTINCT ON (country)
       country
     , count(*) OVER (PARTITION BY country) AS ct_cities
     , population AS highest_population
     , city AS biggest_city
FROM   cities c
WHERE  EXISTS (SELECT FROM cities c1 WHERE c1.country = c.country AND c1.ctid <> c.ctid)
ORDER  BY country, population DESC NULLS LAST, city DESC NULLS LAST;
Run Code Online (Sandbox Code Playgroud)

使用你的 PK,而不是ctid如果你有 PK。看:

如果允许子查询,则:

SELECT *
FROM  (
   SELECT DISTINCT ON (country)
          country
        , count(*) OVER (PARTITION BY country) AS ct_cities
        , population AS highest_population
        , city AS biggest_city
   FROM   cities c
   ORDER  BY country, population DESC NULLS LAST, city DESC NULLS LAST
   ) sub
WHERE  ct_cities > 1;
Run Code Online (Sandbox Code Playgroud)

(array_agg(city ORDER BY population DESC NULLS LAST))[1]通常在每个国家/地区超过几行时表现不佳。聚合大数组,仅获取第一个元素的成本很高。查看性能基准: