使用GROUP BY查询计算百分比

Dea*_*key 30 sql postgresql group-by

我有一个包含3列的表,如下所示:

File    User     Rating (1-5)
------------------------------
00001    1        3
00002    1        4
00003    2        2
00004    3        5
00005    4        3
00005    3        2
00006    2        3
Etc.
Run Code Online (Sandbox Code Playgroud)

我想生成一个输出以下内容的查询(对于每个用户和评级,显示文件数量以及文件百分比):

User    Rating   Count   Percentage
-----------------------------------
1       1         3      .18
1       2         6      .35
1       3         8      .47
2       5         12     .75
2       3         4      .25
Run Code Online (Sandbox Code Playgroud)

使用Postgresql,我知道如何使用以下查询创建包含前3列的查询,但我无法弄清楚如何计算GROUP BY中的百分比:

SELECT
    User,
    Rating,
    Count(*)
FROM
    Results
GROUP BY
    User, Rating
ORDER BY
    User, Rating
Run Code Online (Sandbox Code Playgroud)

在这里,我希望百分比计算适用于每个用户/评级组.

And*_*rus 29

WITH t1 AS 
 (SELECT User, Rating, Count(*) AS n 
  FROM your_table
  GROUP BY User, Rating)
SELECT User, Rating, n, 
       (0.0+n)/(COUNT(*) OVER (PARTITION BY User)) -- no integer divide!
FROM t1;
Run Code Online (Sandbox Code Playgroud)

要么

SELECT User, Rating, Count(*) OVER w_user_rating AS n, 
        (0.0+Count(*) OVER w_user_rating)/(Count(*) OVER (PARTITION BY User)) AS pct
FROM your_table
WINDOW w_user_rating AS (PARTITION BY User, Rating);
Run Code Online (Sandbox Code Playgroud)

我会看到其中一个或另一个是否为您的RDBMS提供了更好的查询计划.

  • 在两个示例中是否需要使用SUM(COUNT(*))?像第一个例子中的`(SUM(COUNT(*))OVER(PARTITION BY User))` 使用SUM我得到预期值,否则我除以等级数而不是它们的计数总和. (5认同)
  • 谢谢安德鲁!我对您的第二个查询的版本进行了稍微修改:`选择用户,等级,cnt,cnt :: float * 100 /(sum(cnt)over(按用户划分))来自(选择用户,等级,count(*)为cnt来自tbl按用户分组,定级)(按用户定级) (2认同)

Nic*_*rey 9

或者,您可以采用老派的方式 - 可以说更容易理解:

select usr.User                   as User   ,
       usr.Rating                 as Rating ,
       usr.N                      as N      ,
       (100.0 * item.N) / total.N as Pct
from ( select User, Rating , count(*) as N
       from Results
       group by User , Rating
     ) usr
join ( select User , count(*) as N
       from Results
       group by User
     ) total on total.User = usr.User
order by usr.User, usr.Rating
Run Code Online (Sandbox Code Playgroud)

干杯!


mik*_*obi 5

最好的方法是使用窗口函数

  • 您能详细说明一下吗? (3认同)

Jam*_*and 5

在 TSQL 中这应该可以工作

SELECT
    User,
    Rating,
    Count(*), SUM(COUNT(*)) OVER (PARTITION BY User, Rating ORDER BY User, Rating) AS Total,
Count(*)/(SUM(COUNT(*)) OVER (PARTITION BY User, Rating ORDER BY User, Rating)) AS Percentage
FROM
    Results
GROUP BY
    User, Rating
ORDER BY
    User, Rating
Run Code Online (Sandbox Code Playgroud)