PostgreSQL 中的多个 DISTINCT ON 子句

Jiv*_*van 6 sql postgresql duplicates distinct-on

是否可以选择DISTINCT ON一些单独的、独立的列集的行?

假设我想要所有符合以下条件的行:

  • 区别于 (name, birth)
  • 区别于 (name, height)

因此,在下表中,标有红色叉号的行不会是不同的(带有失败子句的指示):

name      birth    height
--------------------------
William    1976      1.82
James      1981      1.68
Mike       1976      1.68
Tom        1967      1.79
William    1976      1.74   ? (name, birth)
William    1981      1.82   ? (name, height)
Tom        1978      1.92
Mike       1963      1.68   ? (name, height)
Tom        1971      1.86
James      1981      1.77   ? (name, birth)
Tom        1971      1.89   ? (name, birth)
Run Code Online (Sandbox Code Playgroud)

在上面的示例中,如果DISTINCT ON子句刚刚为DISTINCT ON (name, birth, height),则所有行都将被视为不同的。

试过了,没有用:

  • SELECT DISTINCT ON (name, birth) (name, height) ...
  • SELECT DISTINCT ON (name, birth), (name, height) ...
  • SELECT DISTINCT ON ((name, birth), (name, height)) ...
  • SELECT DISTINCT ON (name, birth) AND (name, height) ...
  • SELECT DISTINCT ON (name, birth) AND ON (name, height) ...
  • SELECT DISTINCT ON (name, birth) DISTINCT ON (name, height) ...
  • SELECT DISTINCT ON (name, birth), DISTINCT ON (name, height) ...

Erw*_*ter 9

正如评论的那样,这个问题有歧义。每次调用的结果行数可能不同。如果您对任意结果感到满意,@klin 的解决方案就足够了。否则,您需要更紧密地定义需求。喜欢:

  • 区别于(name, birth),首先选择最小的高度,然后选择最小的 ID 作为决胜局
  • 区别于(name, height),首先选择最早的出生,然后选择最小的 ID 作为决胜局

您的表应该有一个主键(或某种唯一标识行的方法):

CREATE TEMP TABLE tbl (
  tbl_id serial PRIMARY KEY
, name text
, birth int
, height numeric);

INSERT INTO tbl (name, birth, height)
VALUES
  ('William', 1976, 1.82)
, ('James',   1981, 1.68)
, ('Mike',    1976, 1.68)
, ('Tom',     1967, 1.79)
, ('William', 1976, 1.74)
, ('William', 1981, 1.82)
, ('Tom',     1978, 1.92)
, ('Mike',    1963, 1.68)
, ('Tom',     1971, 1.86)
, ('James',   1981, 1.77)
, ('Tom',     1971, 1.89);
Run Code Online (Sandbox Code Playgroud)

询问:

SELECT DISTINCT ON (name, height) *
FROM  (
   SELECT DISTINCT ON (name, birth) *
   FROM   tbl
   ORDER  BY name, birth, height, tbl_id  -- pick smallest height, ID as tiebreaker
   ) sub
ORDER  BY name, height, birth, tbl_id;    -- pick earliest birth, ID as tiebreaker
Run Code Online (Sandbox Code Playgroud)
CREATE TEMP TABLE tbl (
  tbl_id serial PRIMARY KEY
, name text
, birth int
, height numeric);

INSERT INTO tbl (name, birth, height)
VALUES
  ('William', 1976, 1.82)
, ('James',   1981, 1.68)
, ('Mike',    1976, 1.68)
, ('Tom',     1967, 1.79)
, ('William', 1976, 1.74)
, ('William', 1981, 1.82)
, ('Tom',     1978, 1.92)
, ('Mike',    1963, 1.68)
, ('Tom',     1971, 1.86)
, ('James',   1981, 1.77)
, ('Tom',     1971, 1.89);
Run Code Online (Sandbox Code Playgroud)

DISTINCT ON没有确定性的查询ORDER BY可以从每组欺骗中返回任意行。应用一次,您仍然会得到确定的行数(任意选择)。重复应用,产生的行数也是任意的。有关的: