我正在执行从 Teradata 到 Big query 的迁移。我遇到过在 USING 子句中包含 VALUES 的合并语句。
MERGE INTO department DL
USING VALUES
(
2,'ABC'
) AS V
(Run_Id, Country)
ON DL.department_id = V.Run_Id
WHEN MATCHED THEN
UPDATE SET
department_description = V.country
WHEN NOT MATCHED THEN
INSERT
(
V.Run_Id
, V.Country
curr
);
Run Code Online (Sandbox Code Playgroud)
谁能帮我找到它的 BigQuery 等效项。
BigQuery 支持:
问题#1:“BigQuery 是否支持分析用户定义函数?”
其背后的动机是我想实现Python pandas 代码中常见的拆分-应用-组合模式。这对于组内标准化和使用组统计数据的其他转换很有用。
我在Standart SQL中做了一个小测试:
create or replace function `mydataset.mylen`(arr array<string>) returns int64 as (
array_length(arr)
);
WITH Produce AS
(SELECT 'kale' as item, 23 as purchases, 'vegetable' as category
UNION ALL SELECT 'orange', 2, 'fruit'
UNION ALL SELECT 'cabbage', 9, 'vegetable'
UNION ALL SELECT 'apple', 8, 'fruit'
UNION ALL SELECT 'leek', 2, 'vegetable'
UNION ALL SELECT 'lettuce', 10, 'vegetable')
SELECT
item,
purchases,
category,
`mydataset.mylen`(item) …
Run Code Online (Sandbox Code Playgroud) 我想用动态键提取嵌套的 JSON。我目前可以借助和提取密钥,但解析 JSON 对象的值会产生意外的结果。“[对象对象]”1
2
使用动态键和值解析嵌套 JSON 的正确方法是什么?(我不想使用自定义 JS UDF,但我不确定现有的 JSON 函数是否可以处理该问题。)
{
"key1":{"ItemID":1,"UseCount":4,"ItemCount":7},
"key2":{"ItemID":2,"UseCount":5,"ItemCount":8},
"key3":{"ItemID":3,"UseCount":6,"ItemCount":9}
...
}
Run Code Online (Sandbox Code Playgroud)
bigquery-utils
:
json_extract_keys()
json_extract_values()
WITH
sample_logs AS (
SELECT '{"key1":{"ItemID":1,"UseCount":4,"ItemCount":7},"key2":{"ItemID":2,"UseCount":5,"ItemCount":8},"key3":{"ItemID":3,"UseCount":6,"ItemCount":9}}' as json_string,
UNION ALL SELECT '{"key4":{"ItemID":1,"UseCount":4,"ItemCount":7},"key5":{"ItemID":2,"UseCount":5,"ItemCount":8}}'
)
SELECT
json_string,
key,
TO_JSON_STRING(value) as value,
FROM sample_logs
CROSS JOIN UNNEST(bqutil.fn.json_extract_keys(json_string)) as key WITH OFFSET
INNER JOIN UNNEST(bqutil.fn.json_extract_values(json_string)) as value WITH OFFSET USING (OFFSET)
;
Run Code Online (Sandbox Code Playgroud)
JSON_STRING1 | "key1" | {"ItemID":1,"UseCount":4,"ItemCount":7} -- <- not [object Object]
JSON_STRING1 | …
Run Code Online (Sandbox Code Playgroud) 假设我在 BigQuery 中的数据结构如下:
WITH session_log AS (
SELECT 'ABC' as site_id, 1234 user_id, 12 session_id, '2020-02-10 00:29:59.376000 UTC' start_time, '2020-02-10 01:13:02.817000 UTC' end_time UNION ALL
SELECT 'ABC' as site_id, 1234 user_id, 13 session_id, '2020-02-10 02:41:56.330000 UTC' start_time, '2020-02-10 02:41:56.389999 UTC' end_time UNION ALL
SELECT 'ABC' as site_id, 1234 user_id, 14 session_id, '2020-02-10 04:24:46.649999 UTC' start_time, '2020-02-10 05:14:08.243000 UTC' end_time UNION ALL
SELECT 'ABC' as site_id, 1234 user_id, 15 session_id, '2020-02-10 04:59:21.356999 UTC' start_time, '2020-02-10 15:57:11.501000 UTC' end_time
SELECT 'ABC' as …
Run Code Online (Sandbox Code Playgroud)