在PostgreSQL表中查找数据的统计信息.每列的唯一计数和最高频率

Question

在PostgreSQL表中查找数据的统计信息.每列的唯一计数和最高频率

我需要知道每个表列的一些值,并希望能够在一个查询中执行此操作.

让我们假设我们有一个包含列的表:A,B,C.

A     B      C
--------------------
Red   Red    Red
Red   Blue   Red
Blue  Green  Red
Blue  Green  Red

Run Code Online (Sandbox Code Playgroud)

我想要一个输出,说明A,B和C有多少个唯一值作为单独的列.所以,它会给出

2, 3, 1

Run Code Online (Sandbox Code Playgroud)

A(红色和蓝色)的2个唯一值
B的3个独特值(红色,蓝色和绿色)
1个独特的C值(红色)

无论如何,只需一次通话即可获得此功能.

另外,我想获得最常见值的频率:

2, 2, 4

Run Code Online (Sandbox Code Playgroud)

2因为有2个红色(或蓝色,相同的值),
2因为有2个绿色,
4因为有4个红色

在相同或另一个查询中.

我不想为每一列做单独的查询,因为理论上可能有很多列.

有没有一种有效的方法来做到这一点？

Answer 1

Kam*_*ski 5

使用aggregate functiions&的每列有多少个唯一值DISTINCT:

select
  count(distinct a) as cnt_a,
  count(distinct b) as cnt_b,
  count(distinct c) as cnt_c
from yourtable

Run Code Online (Sandbox Code Playgroud)

返回:

2,3,1

Run Code Online (Sandbox Code Playgroud)

使用window functions&的最常见值的频率aggregate functiions:

select 
  max(cnt_a) as fr_a,
  max(cnt_b) as fr_b,
  max(cnt_c) as fr_c
from (
  select
    count(*) over (partition by a) as cnt_a,
    count(*) over (partition by b) as cnt_b,
    count(*) over (partition by c) as cnt_c
  from yourtable
) t

Run Code Online (Sandbox Code Playgroud)

返回:

2,2,4

Run Code Online (Sandbox Code Playgroud)

结合在一起UNION ALL:

select
  'unique values' as description,
  count(distinct a) as cnt_a,
  count(distinct b) as cnt_b,
  count(distinct c) as cnt_c
from yourtable
union all
select
  'freq of most common value',
  max(cnt_a),
  max(cnt_b),
  max(cnt_c)
from (
  select
    count(*) over (partition by a) as cnt_a,
    count(*) over (partition by b) as cnt_b,
    count(*) over (partition by c) as cnt_c
  from yourtable
) t

Run Code Online (Sandbox Code Playgroud)

返回:

        description        | cnt_a | cnt_b | cnt_c
---------------------------+-------+-------+-------
 unique values             |     2 |     3 |     1
 freq of most common value |     2 |     2 |     4

Run Code Online (Sandbox Code Playgroud)

归档时间：	7 年，4 月前
查看次数：	32 次
最近记录：	7 年，4 月前