如何在BigQuery中重复字段进行分组

vai*_*war 2 google-bigquery bigquery-standard-sql

在BigQuery中,我创建了一个包含以下模式的表

id                  INTEGER NULLABLE    
visits              INTEGER NULLABLE    
dimensions          RECORD  REPEATED    
dimensions.value    STRING  
dimensions.key      STRING  
Run Code Online (Sandbox Code Playgroud)

如何通过分组设备和状态值来获得总和(访问)?

示例数据:

{"id": 1, visits: 100, "dimensions": [{"key":"device","value":"mobile"}, {"key":"state","value":"CA"}]}
{"id": 1, visits: 500, "dimensions": [{"key":"device","value":"desktop"}, {"key":"state","value":"CA"}]}
{"id": 1, visits: 200, "dimensions": [{"key":"device","value":"mobile"}, {"key":"state","value":"NY"}]}
{"id": 2, visits: 100, "dimensions": [{"key":"device","value":"mobile"}, {"key":"state","value":"CA"}]}
{"id": 2, visits: 500, "dimensions": [{"key":"device","value":"desktop"}, {"key":"state","value":"CA"}]}
{"id": 2, visits: 200, "dimensions": [{"key":"device","value":"mobile"}, {"key":"state","value":"NY"}]}
{"id": 2, visits: 780, "dimensions": [{"key":"device","value":"desktop"}, {"key":"state","value":"NY"}]}
Run Code Online (Sandbox Code Playgroud)

我想在输出中输入id,device,state,sum(visits).

我可以通过使用以下查询的单个维度来执行组,但不知道如何为多个维度执行此操作.

SELECT id,d.value, sum(visits) FROM dataset.tabe_name,UNNEST(dimensions) as d where d.key = "device" group by id, d.value LIMIT 1000
Run Code Online (Sandbox Code Playgroud)

还有可能在事先不知道键值时编写通用查询吗?

Mik*_*ant 8

以下是BigQuery Standard SQL

#standardSQL
SELECT 
  id,
  (SELECT value FROM UNNEST(dimensions) WHERE key = "device") AS device,
  (SELECT value FROM UNNEST(dimensions) WHERE key = "state") AS state,
  SUM(visits) AS visits
FROM `dataset.tabe_name`  
GROUP BY id, device, state
LIMIT 1000   
Run Code Online (Sandbox Code Playgroud)

您可以尝试使用示例中的虚拟数据播放它,如下所示

#standardSQL
WITH data AS (
  SELECT 1 AS id, 100 AS visits, ARRAY<STRUCT<key STRING, value STRING>>[("device", "mobile"), ("state", "CA")] AS dimensions UNION ALL
  SELECT 1, 100, [STRUCT<key STRING, value STRING>("device", "mobile"), ("state", "CA")] UNION ALL
  SELECT 1, 500, [STRUCT<key STRING, value STRING>("device", "desktop"), ("state", "CA")] UNION ALL
  SELECT 1, 200, [STRUCT<key STRING, value STRING>("device", "mobile"), ("state", "NY")] UNION ALL
  SELECT 2, 100, [STRUCT<key STRING, value STRING>("device", "mobile"), ("state", "CA")] UNION ALL
  SELECT 2, 500, [STRUCT<key STRING, value STRING>("device", "desktop"), ("state", "CA")] UNION ALL
  SELECT 2, 200, [STRUCT<key STRING, value STRING>("device", "mobile"), ("state", "NY")] UNION ALL
  SELECT 2, 780, [STRUCT<key STRING, value STRING>("device", "desktop"), ("state", "NY")] 
)
SELECT 
  id,
  (SELECT value FROM UNNEST(dimensions) WHERE key = "device") AS device,
  (SELECT value FROM UNNEST(dimensions) WHERE key = "state") AS state,
  SUM(visits) AS visits
FROM data  
GROUP BY id, device, state
-- ORDER BY id, device, state
Run Code Online (Sandbox Code Playgroud)