如何在 Google 的 Bigquery 中获取最频繁的值

And*_*der 3 google-bigquery

Postgres 有一个简单的函数来实现这一点,只需使用mode()函数我们就可以找到最频繁的值。谷歌的 Bigquery 中是否有类似的东西?

如何在 Bigquery 中编写这样的查询?

select count(*),
       avg(vehicles)                                         as mean,
       percentile_cont(0.5) within group (order by vehicles) as median,
       mode() within group (order by vehicles)               as most_frequent_value
FROM "driver"
WHERE vehicles is not null;
Run Code Online (Sandbox Code Playgroud)

Ell*_*ard 10

您可以使用APPROX_TOP_COUNT来获取最高值,例如:

SELECT APPROX_TOP_COUNT(vehicles, 5) AS top_five_vehicles
FROM dataset.driver
Run Code Online (Sandbox Code Playgroud)

如果您只想要顶部值,可以从数组中选择它:

SELECT APPROX_TOP_COUNT(vehicles, 1)[OFFSET(0)] AS most_frequent_value
FROM dataset.driver
Run Code Online (Sandbox Code Playgroud)

  • 如果您只需要值,请附加“.value” - 该函数返回一个包含值和计数的结构。 (2认同)

Mik*_*ant 6

下面是 BigQuery 标准 SQL

选项1

#standardSQL
SELECT * FROM (
  SELECT COUNT(*) AS cnt,
    AVG(vehicles) AS mean,
    APPROX_TOP_COUNT(vehicles, 1)[OFFSET(0)].value AS most_frequent_value
  FROM `project.dataset.table`
  WHERE vehicles IS NOT NULL
) CROSS JOIN (
  SELECT PERCENTILE_CONT(vehicles, 0.5) OVER() AS median
  FROM `project.dataset.table`
  WHERE vehicles IS NOT NULL
  LIMIT 1
)
Run Code Online (Sandbox Code Playgroud)

选项 2

#standardSQL
SELECT * FROM (
  SELECT COUNT(*) cnt,
    AVG(vehicles) AS mean
  FROM `project.dataset.table`
  WHERE vehicles IS NOT NULL
) CROSS JOIN (
  SELECT PERCENTILE_CONT(vehicles, 0.5) OVER() AS median
  FROM `project.dataset.table`
  WHERE vehicles IS NOT NULL
  LIMIT 1
) CROSS JOIN (
  SELECT vehicles AS most_frequent_value
  FROM `project.dataset.table`
  WHERE vehicles IS NOT NULL
  GROUP BY vehicles
  ORDER BY COUNT(1) DESC
  LIMIT 1
)  
Run Code Online (Sandbox Code Playgroud)

选项 3

#standardSQL
CREATE TEMP FUNCTION median(arr ANY TYPE) AS ((
  SELECT PERCENTILE_CONT(x, 0.5) OVER() 
  FROM UNNEST(arr) x LIMIT 1 
));
CREATE TEMP FUNCTION most_frequent_value(arr ANY TYPE) AS ((
  SELECT x 
  FROM UNNEST(arr) x
  GROUP BY x
  ORDER BY COUNT(1) DESC
  LIMIT 1  
));
SELECT COUNT(*) cnt,
  AVG(vehicles) AS mean,
  median(ARRAY_AGG(vehicles)) AS median,
  most_frequent_value(ARRAY_AGG(vehicles)) AS most_frequent_value
FROM `project.dataset.table`
WHERE vehicles IS NOT NULL   
Run Code Online (Sandbox Code Playgroud)

等等 ...