SELECT DISTINCT ON，按另一列排序

Question

SELECT DISTINCT ON，按另一列排序

Lui*_*uis 8 postgresql order-by greatest-n-per-group distinct postgresql-9.6

请考虑下表test：

CREATE TABLE test(col1 int, col2 varchar, col3 date);
INSERT INTO test VALUES
  (1,'abc','2015-09-10')
, (1,'abc','2015-09-11')
, (2,'xyz','2015-09-12')
, (2,'xyz','2015-09-13')
, (3,'tcs','2015-01-15')
, (3,'tcs','2015-01-18');

Run Code Online (Sandbox Code Playgroud)

postgres=# select * from test;
  col1 | col2 |    col3    
 ------+------+------------
     1 | abc  | 2015-09-10
     1 | abc  | 2015-09-11
     2 | xyz  | 2015-09-12
     2 | xyz  | 2015-09-13
     3 | tcs  | 2015-01-15
     3 | tcs  | 2015-01-18

Run Code Online (Sandbox Code Playgroud)

我想要一个按日期 desc 排序的返回集合：

 col1 | col2 |    col3    
------+------+------------
    2 | xyz  | 2015-09-13
    1 | abc  | 2015-09-11
    3 | tcs  | 2015-01-18

Run Code Online (Sandbox Code Playgroud)

我设法完成的distinct on：

select distinct on (col1) col1, col2, col3 from test order by col1, col3 desc;
 col1 | col2 |    col3    
------+------+------------
    1 | abc  | 2015-09-11
    2 | xyz  | 2015-09-13
    3 | tcs  | 2015-01-18

Run Code Online (Sandbox Code Playgroud)

而不是我需要的having：

select distinct on (col1) col1, col2, col3 from test group by col1, col2, col3 having col3 = max(col3)
 col1 | col2 |    col3    
------+------+------------
    1 | abc  | 2015-09-10
    2 | xyz  | 2015-09-13
    3 | tcs  | 2015-01-18

Run Code Online (Sandbox Code Playgroud)

Answer 1

Erw*_*ter 7

您仍然可以使用DISTINCT ON. 只需将其包装到外部查询中即可根据您的需要进行排序。看：

SELECT *
FROM  (
   SELECT DISTINCT ON (col1)
          col1, col2, col3
   FROM   test
   ORDER  BY col1, col3 DESC
   ) sub
ORDER  BY col3 DESC, col2;

Run Code Online (Sandbox Code Playgroud)

假设col2在功能上取决于col1，所以我们可以忽略它DISTINCT ON和ORDER BY内查询。但我将它添加到外部ORDER BY作为有意义的决胜局。如果col2没有不是唯一的col1，您可能会col1额外附加。

假设col3已定义NOT NULL。否则追加NULLS LAST：

PostgreSQL 按日期时间 asc 排序，先为空？

每个只有几行(col1)，这通常是最快的解决方案。看：

选择每个 GROUP BY 组中的第一行？

db<>在这里摆弄

带有窗口函数的子查询row_number()（如 Vérace 建议的）是一种有效的替代方法，但通常较慢。我做了很多测试，但你自己试试。它必须排序两次，就像DISTINCT ON（如果期望更快，它可能会在内部切换到散列算法），但它会在内部查询之后保留所有行，从而增加了不必要的成本。无论哪种方式，您都不需要ORDER BY在内部查询中：

SELECT col1, col2, col3
FROM  (
   SELECT col1, col2, col3
       ,  row_number() OVER (PARTITION BY col1 ORDER BY col3 DESC) AS rn
   FROM   test
   ) sub
WHERE  rn = 1
ORDER  BY col3 DESC, col2;

Run Code Online (Sandbox Code Playgroud)

如果不需要，请不要使用 CTE。它通常要贵得多（直到 Postgres 12，主要是修复了这个问题）。

对于多行 per col1，索引变得更加重要，并且通常有更快的替代方案。看：

优化 GROUP BY 查询以检索每个用户的最新行

此外，与 Oracle 或 SQL Server 不同，PostgreSQL 不使用术语“分析函数”来表示窗口函数。（关于这些函数的“分析”是什么？）

Answer 2

Vér*_*ace 6

这是一个经典的greatest-n-per-group问题。它们经常出现在许多领域，并且Analytic functions（见下文）非常值得研究。

如今，它通常通过使用分析（又名窗口）函数来解决 - 请参阅 fiddle here。

您可以使用此查询 -

WITH cte AS
(
  SELECT 
    ROW_NUMBER() OVER (PARTITION BY col1, col2 ORDER BY col3 DESC) AS rn,
    col1, col2, col3 
  FROM test
  ORDER BY col3 DESC
)
SELECT * FROM cte 
WHERE rn = 1

Run Code Online (Sandbox Code Playgroud)

结果 -

rn  col1    col2    col3
1   2   xyz     2015-09-13
1   1   abc     2015-09-11
1   3   tcs     2015-01-18

Run Code Online (Sandbox Code Playgroud)

分析函数非常值得了解 - 它们非常强大，您会发现它们会因您为学习它们所做的任何努力而多次回报您。自行运行内部查询 - 实验，这是我学习的方式。顺便说一句，使用您正在使用的 PostgreSQL 版本标记您的问题总是值得的！

一个更传统的方法是

SELECT x, y, mc FROM
(
  SELECT col1 AS x, col2 AS y, MAX(col3) AS mc
  FROM test
  GROUP BY col1, col2
) AS tab
ORDER BY mc

Run Code Online (Sandbox Code Playgroud)

相同的结果 - 也在小提琴上。

归档时间：	6 年，1 月前
查看次数：	3867 次
最近记录：	6 年，1 月前