查询Google BigQuery不同列中的键值

Tim*_*mon 2 google-bigquery firebase-analytics

我收集了与Google BigQuery链接的Firebase Analytics的分析.

我在BigQuery中有以下数据(不必要的列/行没有关闭,数据集看起来类似于https://bigquery.cloud.google.com/table/firebase-analytics-sample-data:ios_dataset.app_events_20160607?tab=preview):

| event_dim.name | event_dim.params.key | event_dim.params.value.string_value |
|----------------|----------------------|-------------------------------------|
| read_post      | post_id              | p_100                               |
|                | group_id             | g_1                                 |
|                | user_id              | u_1                                 |
| open_group     | post_id              | p_200                               |
|                | group_id             | g_2                                 |
|                | user_id              | u_1                                 |
| open_group     | post_id              | p_300                               |
|                | group_id             | g_1                                 |
|                | user_id              | u_3                                 |
Run Code Online (Sandbox Code Playgroud)

我想查询以下数据:

  • 事件名称
  • 用户身份
  • 组ID

我尝试了以下查询:

SELECT
  event_dim.name,
  FIRST(IF(event_dim.params.key = "user_id", event_dim.params.value.string_value, NULL)) WITHIN RECORD user_id,
  FIRST(IF(event_dim.params.key = "group_id", event_dim.params.value.string_value, NULL)) WITHIN RECORD group_id
FROM
  [xxx:xxx_IOS.app_events_20161102]
LIMIT
  1000
Run Code Online (Sandbox Code Playgroud)

上述查询的问题是聚合函数FIRST会给出错误的结果,因为SELECT带有WITHIN修饰符的语句将返回结果列表.该FIRST函数仅在第一行的情况下给出正确的结果.

Ell*_*ard 6

使用标准SQL(取消选中"显示选项"下的"使用旧版SQL"),您可以执行以下操作:

SELECT
  event_dim.name,
  (SELECT value.string_value FROM UNNEST(params)
   WHERE key = 'user_id') AS user_id,
  (SELECT value.string_value FROM UNNEST(params)
   WHERE key = 'group_id') AS group_id
FROM `firebase-analytics-sample-data.ios_dataset.app_events_20160607`,
  UNNEST(event_dim) AS event_dim
LIMIT 1000;
Run Code Online (Sandbox Code Playgroud)

如果你只是想有两行'user_id''group_id',你可以过滤掉NULL值:

SELECT * FROM (
  SELECT
    event_dim.name,
    (SELECT value.string_value FROM UNNEST(params)
     WHERE key = 'user_id') AS user_id,
    (SELECT value.string_value FROM UNNEST(params)
     WHERE key = 'group_id') AS group_id
  FROM `firebase-analytics-sample-data.ios_dataset.app_events_20160607`,
    UNNEST(event_dim) AS event_dim
)
WHERE user_id IS NOT NULL AND group_id IS NOT NULL
LIMIT 1000;
Run Code Online (Sandbox Code Playgroud)