选择每组至少一行满足条件的行

use*_*521 9 postgresql postgresql-9.4

我有下表:

create table test (
  company_id integer not null, 
  client_id integer not null, 
  client_status text,
  unique (company_id, client_id)
);

insert into test values
  (1, 1, 'y'),    -- company1

  (2, 2, null),   -- company2

  (3, 3, 'n'),    -- company3

  (4, 4, 'y'),    -- company4
  (4, 5, 'n'),

  (5, 6, null),   -- company5
  (5, 7, 'n')
;
Run Code Online (Sandbox Code Playgroud)

基本上,有 5 家不同的公司,每家公司都有一个或多个客户,每个客户的状态为:“y”或“n”(也可能为空)。

我必须做的是为(company_id, client_id)至少有一个客户的状态不是“n”(“y”或 null)的所有公司选择所有对。所以对于上面的示例数据,输出应该是:

company_id;client_id
1;1
2;2
4;4
4;5
5;6
5;7
Run Code Online (Sandbox Code Playgroud)

我尝试了一些使用窗口函数的东西,但我无法弄清楚如何将所有客户端的数量与带有STATUS = 'n'.

select company_id,
count(*) over (partition by company_id) as all_clients_count
from test
-- where all_clients_count != ... ?
Run Code Online (Sandbox Code Playgroud)

我想出了如何做到这一点,但我不确定这是否是正确的方法:

select sub.company_id, unnest(sub.client_ids)
from (
  select company_id, array_agg(client_id) as client_ids
  from test
  group by company_id
  having count(*) != count( (case when client_status = 'n' then 1 else null end) )
) sub
Run Code Online (Sandbox Code Playgroud)

Erw*_*ter 6

基本上你正在寻找表达式:

client_status IS DISTINCT FROM 'n'
Run Code Online (Sandbox Code Playgroud)

该列client_status应该真正是数据类型boolean,而不是text,这将允许更简单的表达式:

client_status IS NOT FALSE
Run Code Online (Sandbox Code Playgroud)

该手册已在本章详细比较操作符


假设您的实际表具有UNIQUEorPK约束,我们得出:

CREATE TABLE test (
  company_id    integer NOT NULL, 
  client_id     integer NOT NULL, 
  client_status boolean,
  PRIMARY KEY (company_id, client_id)
);
Run Code Online (Sandbox Code Playgroud)

查询

所有这些都做同样的事情(你问的),最快的取决于数据分布:

SELECT company_id, client_id
FROM   test t
WHERE  EXISTS (
   SELECT FROM test
   WHERE  company_id = t.company_id
   AND    client_status IS NOT FALSE
   );
Run Code Online (Sandbox Code Playgroud)

或者:

SELECT company_id, client_id
FROM   test t
JOIN  (
   SELECT company_id
   FROM   test t
   GROUP  BY 1
   HAVING bool_or(client_status IS NOT FALSE)
   ) c USING (company_id);
Run Code Online (Sandbox Code Playgroud)

或者:

SELECT company_id, client_id
FROM   test t
JOIN  (
   SELECT DISTINCT company_id, client_status 
   FROM   test t
   ORDER  BY company_id, client_status DESC
   ) c USING (company_id)
WHERE  c.client_status IS NOT FALSE;
Run Code Online (Sandbox Code Playgroud)

布尔值排序FALSE-> TRUE->NULL按升序排序。所以 FALSE按降序排在最后。如果有任何其他可用的值,那么首先选择那个值......

添加的 PK 是通过对这些查询有用的索引实现的。如果您想要更快,请为查询 1 添加部分索引:

CREATE INDEX test_special_idx ON test (company_id, client_id)
WHERE  client_status IS NOT FALSE;
Run Code Online (Sandbox Code Playgroud)

也可以使用窗口函数,但那样会更慢。示例first_value()

SELECT company_id, client_id
FROM  (
   SELECT company_id, client_id
        , first_value(client_status) OVER (PARTITION BY company_id
                                           ORDER BY client_status DESC) AS stat
   FROM   test t
   ) sub
WHERE  stat IS NOT FALSE;
Run Code Online (Sandbox Code Playgroud)

对于很多行 per company_id,这些技术之一可能会更快,但仍然: