use*_*521 9 postgresql postgresql-9.4
我有下表:
create table test (
company_id integer not null,
client_id integer not null,
client_status text,
unique (company_id, client_id)
);
insert into test values
(1, 1, 'y'), -- company1
(2, 2, null), -- company2
(3, 3, 'n'), -- company3
(4, 4, 'y'), -- company4
(4, 5, 'n'),
(5, 6, null), -- company5
(5, 7, 'n')
;
Run Code Online (Sandbox Code Playgroud)
基本上,有 5 家不同的公司,每家公司都有一个或多个客户,每个客户的状态为:“y”或“n”(也可能为空)。
我必须做的是为(company_id, client_id)至少有一个客户的状态不是“n”(“y”或 null)的所有公司选择所有对。所以对于上面的示例数据,输出应该是:
company_id;client_id
1;1
2;2
4;4
4;5
5;6
5;7
Run Code Online (Sandbox Code Playgroud)
我尝试了一些使用窗口函数的东西,但我无法弄清楚如何将所有客户端的数量与带有STATUS = 'n'.
select company_id,
count(*) over (partition by company_id) as all_clients_count
from test
-- where all_clients_count != ... ?
Run Code Online (Sandbox Code Playgroud)
我想出了如何做到这一点,但我不确定这是否是正确的方法:
select sub.company_id, unnest(sub.client_ids)
from (
select company_id, array_agg(client_id) as client_ids
from test
group by company_id
having count(*) != count( (case when client_status = 'n' then 1 else null end) )
) sub
Run Code Online (Sandbox Code Playgroud)
基本上你正在寻找表达式:
client_status IS DISTINCT FROM 'n'
Run Code Online (Sandbox Code Playgroud)
该列client_status应该真正是数据类型boolean,而不是text,这将允许更简单的表达式:
client_status IS NOT FALSE
Run Code Online (Sandbox Code Playgroud)
该手册已在本章详细比较操作符。
假设您的实际表具有UNIQUEorPK约束,我们得出:
CREATE TABLE test (
company_id integer NOT NULL,
client_id integer NOT NULL,
client_status boolean,
PRIMARY KEY (company_id, client_id)
);
Run Code Online (Sandbox Code Playgroud)
所有这些都做同样的事情(你问的),最快的取决于数据分布:
SELECT company_id, client_id
FROM test t
WHERE EXISTS (
SELECT FROM test
WHERE company_id = t.company_id
AND client_status IS NOT FALSE
);
Run Code Online (Sandbox Code Playgroud)
或者:
SELECT company_id, client_id
FROM test t
JOIN (
SELECT company_id
FROM test t
GROUP BY 1
HAVING bool_or(client_status IS NOT FALSE)
) c USING (company_id);
Run Code Online (Sandbox Code Playgroud)
或者:
SELECT company_id, client_id
FROM test t
JOIN (
SELECT DISTINCT company_id, client_status
FROM test t
ORDER BY company_id, client_status DESC
) c USING (company_id)
WHERE c.client_status IS NOT FALSE;
Run Code Online (Sandbox Code Playgroud)
布尔值排序FALSE-> TRUE->NULL按升序排序。所以 FALSE按降序排在最后。如果有任何其他可用的值,那么首先选择那个值......
添加的 PK 是通过对这些查询有用的索引实现的。如果您想要更快,请为查询 1 添加部分索引:
CREATE INDEX test_special_idx ON test (company_id, client_id)
WHERE client_status IS NOT FALSE;
Run Code Online (Sandbox Code Playgroud)
您也可以使用窗口函数,但那样会更慢。示例first_value():
SELECT company_id, client_id
FROM (
SELECT company_id, client_id
, first_value(client_status) OVER (PARTITION BY company_id
ORDER BY client_status DESC) AS stat
FROM test t
) sub
WHERE stat IS NOT FALSE;
Run Code Online (Sandbox Code Playgroud)
对于很多行 per company_id,这些技术之一可能会更快,但仍然:
| 归档时间: |
|
| 查看次数: |
35101 次 |
| 最近记录: |