我正在对数据集执行GROUP BY和COUNT(*),我想计算每个组在总数上的百分比.
例如,在这个查询中,我想知道每个状态的count()代表总数(从publicdata:samples.natality中选择count()):
SELECT state, count(*)
FROM [publicdata:samples.natality]
GROUP by state
Run Code Online (Sandbox Code Playgroud)
在SQL中有几种方法可以做到这一点,但我还没有找到在Bigquery中做到这一点的方法,有人知道吗?
谢谢!
Fel*_*ffa 15
检查ratio_to_report,这是最近公布的窗口函数之一:
SELECT state, ratio * 100 AS percent FROM (
SELECT state, count(*) AS total, RATIO_TO_REPORT(total) OVER() AS ratio
FROM [publicdata:samples.natality]
GROUP by state
)
state percent
AL 1.4201828131159113
AK 0.23521048665998198
AZ 1.3332896746620975
AR 0.7709591206172346
CA 10.008298605982642
Run Code Online (Sandbox Code Playgroud)
eva*_*n_b 12
修改 Felipe 对标准 SQL BigQuery 方言而不是 Legacy SQL 方言的回答如下所示:
select state, 100*(state_count / total) as pct
from (
SELECT state, count(*) AS state_count, sum(count(*)) OVER() AS total
FROM `bigquery-public-data.samples.natality`
GROUP by state
) s
Run Code Online (Sandbox Code Playgroud)
标准 SQL BigQuery 聚合分析函数(又名“窗口函数”)的文档在这里:https : //cloud.google.com/bigquery/docs/reference/standard-sql/analytic-function-concepts
Joh*_*y V 10
您可以使用窗口函数按组获取总数的百分比,而不需要子查询(改进 evan_b 的解决方案):
SELECT
state
,count(*) / (sum(count(*)) OVER()) as pct
FROM
`bigquery-public-data.samples.natality`
GROUP BY
state
Run Code Online (Sandbox Code Playgroud)
您可以使用虚拟值作为键对总数进行自连接。例如:
SELECT
t1.state AS state,
t1.cnt AS cnt,
100 * t1.cnt / t2.total as percent
FROM (
SELECT
state,
COUNT(*) AS cnt,
1 AS key
FROM
[publicdata:samples.natality]
WHERE state is not null
GROUP BY
state) AS t1
JOIN (
SELECT
COUNT(*) AS total,
1 AS key
FROM
[publicdata:samples.natality]) AS t2
ON t1.key = t2.key
ORDER BY percent DESC
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
7643 次 |
| 最近记录: |