使用group by查找数组中最常见的元素

mhk*_*ler 4 sql postgresql aggregate-functions lateral postgresql-9.3

我有一个具有以下结构的行表,name TEXT, favorite_colors TEXT[], group_name INTEGER其中每一行都有每个人喜欢的颜色和该人所属组的列表。如何GROUP BY group_name获得每个组中最常见的颜色的列表?

您可以组合int[] && int[]设置重叠,int[] & int[]获取交点然后进行其他计数和排名吗?

Erw*_*ter 5

快速又肮脏:

SELECT group_name, color, count(*) AS ct
FROM (
   SELECT group_name, unnest(favorite_colors) AS color
   FROM   tbl
   ) sub
GROUP  BY 1,2
ORDER  BY 1,3 DESC;
Run Code Online (Sandbox Code Playgroud)

一个更好 LATERAL JOIN

在Postgres 9.3或更高版本中,这是更简洁的形式:

SELECT group_name, color, count(*) AS ct
FROM   tbl t, unnest(t.favorite_colors) AS color
GROUP  BY 1,2
ORDER  BY 1,3 DESC;
Run Code Online (Sandbox Code Playgroud)

以上是的简写

...
FROM tbl t
JOIN LATERAL unnest(t.favorite_colors) AS color ON TRUE
...
Run Code Online (Sandbox Code Playgroud)

像其他查询一样INNER JOIN,它会排除没有颜色(favorite_colors IS NULL)的行-与第一个查询一样。

要在结果中包括此类行,请改用:

SELECT group_name, color, count(*) AS ct
FROM   tbl t
LEFT   JOIN LATERAL unnest(t.favorite_colors) AS color ON TRUE
GROUP  BY 1,2
ORDER  BY 1,3 DESC;
Run Code Online (Sandbox Code Playgroud)

您可以在下一步中轻松汇总每个组的“最常用颜色”,但是您需要首先定义“最常用颜色” ...

最常见的颜色

根据评论,选择> 3次出现的颜色。

SELECT t.group_name, color, count(*) AS ct
FROM   tbl t, unnest(t.favorite_colors) AS color
GROUP  BY 1,2
HAVING count(*) > 3
ORDER  BY 1,3 DESC;
Run Code Online (Sandbox Code Playgroud)

汇总数组中的顶部颜色(降序排列):

SELECT group_name, array_agg(color) AS top_colors
FROM  (
   SELECT group_name, color
   FROM   tbl t, unnest(t.favorite_colors) AS color
   GROUP  BY 1,2
   HAVING count(*) > 3
   ORDER  BY 1, count(*) DESC
   ) sub
GROUP BY 1;
Run Code Online (Sandbox Code Playgroud)

-> SQLfiddle演示所有内容。