在BigQuery中使用GROUPBY进行百分位函数

don*_*hcd 14 google-bigquery

在我的CENSUS表中,我想按国家分组,并且每个州获得县中位数和县的数量.

在psql,redshift和snowflake中,我可以这样做:

psql=> SELECT state, count(county), PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY "population2000") AS median FROM CENSUS GROUP BY state;
        state         | count |  median
----------------------+-------+----------
 Alabama              |    67 |    36583
 Alaska               |    24 |   7296.5
 Arizona              |    15 |   116320
 Arkansas             |    75 |    20229
...
Run Code Online (Sandbox Code Playgroud)

我试图在标准的BigQuery中找到一个很好的方法来做到这一点.我注意到有没有文档的percentile_cont分析功能可用,但我必须做一些主要的黑客来让它做我想要的.

我希望能够用我收集到的正确的论点做同样的事情:

SELECT
  state,
  COUNT(county),
  PERCENTILE_CONT(population2000,
    0.5) OVER () AS `medPop`
FROM
  CENSUS
GROUP BY
  state;
Run Code Online (Sandbox Code Playgroud)

但是这个查询会产生错误

SELECT list expression references column population2000 which is neither grouped nor aggregated at
Run Code Online (Sandbox Code Playgroud)

可以得到我想要的答案,但如果这是我想做的事情的推荐方式,我会非常失望:

SELECT
  MAX(nCounties) AS nCounties,
  state,
  MAX(medPop) AS medPop
FROM (
  SELECT
    nCounties,
    T1.state,
    (PERCENTILE_CONT(population2000,
        0.5) OVER (PARTITION BY T1.state)) AS `medPop`
  FROM
    census T1
  LEFT OUTER JOIN (
    SELECT
      COUNT(county) AS `nCounties`,
      state
    FROM
      census
    GROUP BY
      state) T2
  ON
    T1.state = T2.state) T3
GROUP BY
  state
Run Code Online (Sandbox Code Playgroud)

有没有更好的方法来做我想做的事情?此外,该PERCENTILE_CONT功能是否会被记录?

谢谢阅读!

Min*_*ong 17

谢谢你的关注.PERCENTILE_CONT正在开发中,一旦它成为GA,我们将发布文档.我们首先支持它作为分析函数,并且我们计划稍后将其作为聚合函数(允许GROUP BY)支持它.在这两个版本之间,可以采用更简单的解决方法

SELECT
  state,
  ANY_VALUE(nCounties) AS nCounties,
  ANY_VALUE(medPop) AS medPop
FROM (
  SELECT
    state,
    COUNT(county) OVER (PARTITION BY state) AS nCounties,
    PERCENTILE_CONT(population2000,
      0.5) OVER (PARTITION BY state) AS medPop
  FROM
    CENSUS)
GROUP BY
  state
Run Code Online (Sandbox Code Playgroud)

  • 将"PERCENTILE_CONT"添加为聚合函数的任何更新? (21认同)
  • 更新:我们已经在https://cloud.google.com/bigquery/docs/reference/standard-sql/functions-and-operators#percentile_cont发布了文档。 (2认同)