Ben*_*eid 1 sql google-bigquery
我有一个如下所示的Google BigQuery表:
? id ? col_1 ? col_2 ? updated ?
? 1 ? first_data ? null ? 4/22 ?
? 1 ? null ? old ? 4/23 ?
? 1 ? null ? correct ? 4/24 ?
Run Code Online (Sandbox Code Playgroud)
我想构造一个将这些行和“覆盖”空列组合在一起的查询,如果存在具有相同ID且该列不为空的行。本质上,结果应如下所示:
? 1 ? first_data ? correct ? 4/24 ?
Run Code Online (Sandbox Code Playgroud)
如果可能的话,我也希望结果代表历史:
? 1 ? first_data ? old ? 4/23 ?
? 1 ? first_data ? correct ? 4/24 ?
Run Code Online (Sandbox Code Playgroud)
但这是次要的,没有必要。
以下是BigQuery标准SQL
#standardSQL
SELECT id,
IFNULL(col_1, FIRST_VALUE(col_1 IGNORE NULLS) OVER(win)) col_1,
IFNULL(col_2, FIRST_VALUE(col_2 IGNORE NULLS) OVER(win)) col_2,
updated
FROM `project.dataset.your_table`
WINDOW win AS (PARTITION BY id ORDER BY updated DESC
ROWS BETWEEN 1 FOLLOWING AND UNBOUNDED FOLLOWING)
-- ORDER BY id, updated
Run Code Online (Sandbox Code Playgroud)
您可以使用以下虚拟数据测试/玩游戏
#standardSQL
WITH `project.dataset.your_table` AS (
SELECT 1 id, 'first_data' col_1, NULL col_2, '4/22' updated UNION ALL
SELECT 1, NULL, 'old', '4/23' UNION ALL
SELECT 1, NULL, 'correct', '4/24' UNION ALL
SELECT 1, 'next_data', NULL, '4/25' UNION ALL
SELECT 1, NULL, NULL, '4/26'
)
SELECT id,
IFNULL(col_1, FIRST_VALUE(col_1 IGNORE NULLS) OVER(win)) col_1,
IFNULL(col_2, FIRST_VALUE(col_2 IGNORE NULLS) OVER(win)) col_2,
updated
FROM `project.dataset.your_table`
WINDOW win AS (PARTITION BY id ORDER BY updated DESC
ROWS BETWEEN 1 FOLLOWING AND UNBOUNDED FOLLOWING)
ORDER BY id, updated
Run Code Online (Sandbox Code Playgroud)
结果
Row id col_1 col_2 updated
1 1 first_data null 4/22
2 1 first_data old 4/23
3 1 first_data correct 4/24
4 1 next_data correct 4/25
5 1 next_data correct 4/26
Run Code Online (Sandbox Code Playgroud)