标签: bigquery-udf

合并语句的 BigQuery 等效项

我正在执行从 Teradata 到 Big query 的迁移。我遇到过在 USING 子句中包含 VALUES 的合并语句。

MERGE INTO department DL
                        USING VALUES
                        (
                        2,'ABC'
                        ) AS V 
                        (Run_Id, Country) 
                          ON DL.department_id = V.Run_Id
                        WHEN MATCHED THEN
                          UPDATE SET 
                            department_description = V.country
                        WHEN NOT MATCHED THEN
                          INSERT
                          (
                          V.Run_Id
                          , V.Country
                          curr
                          ); 
Run Code Online (Sandbox Code Playgroud)

谁能帮我找到它的 BigQuery 等效项。

merge-statement google-bigquery bigquery-udf

4
推荐指数
1
解决办法
9806
查看次数

BigQuery 是否支持分析用户定义函数?

BigQuery 支持:

  1. SQL 和 JavaScript 中的用户定义函数(UDF)。
  2. 计算一组行的值并为每行返回一个结果的分析函数。这些函数可以与OVER子句一起使用。有一组预定义的分析函数。

问题#1:“BigQuery 是否支持分析用户定义函数?”

其背后的动机是我想实现Python pandas 代码中常见的拆分-应用-组合模式。这对于组内标准化和使用组统计数据的其他转换很有用。

我在Standart SQL中做了一个小测试:

create or replace function `mydataset.mylen`(arr array<string>) returns int64 as (
  array_length(arr)
);

WITH Produce AS
 (SELECT 'kale' as item, 23 as purchases, 'vegetable' as category
  UNION ALL SELECT 'orange', 2, 'fruit'
  UNION ALL SELECT 'cabbage', 9, 'vegetable'
  UNION ALL SELECT 'apple', 8, 'fruit'
  UNION ALL SELECT 'leek', 2, 'vegetable'
  UNION ALL SELECT 'lettuce', 10, 'vegetable')
SELECT 
  item, 
  purchases, 
  category, 
  `mydataset.mylen`(item) …
Run Code Online (Sandbox Code Playgroud)

analytic-functions google-bigquery bigquery-udf

3
推荐指数
1
解决办法
1060
查看次数

BigQuery 使用动态键和值提取嵌套 JSON

我想用动态键提取嵌套的 JSON。我目前可以借助和提取密钥,但解析 JSON 对象的值会产生意外的结果。“[对象对象]”12

使用动态键和值解析嵌套 JSON 的正确方法是什么?(我不想使用自定义 JS UDF,但我不确定现有的 JSON 函数是否可以处理该问题。)

记录的输入栏
{
    "key1":{"ItemID":1,"UseCount":4,"ItemCount":7},
    "key2":{"ItemID":2,"UseCount":5,"ItemCount":8},
    "key3":{"ItemID":3,"UseCount":6,"ItemCount":9}
    ...
}
Run Code Online (Sandbox Code Playgroud)
当前查询

bigquery-utilsjson_extract_keys() json_extract_values()

WITH
sample_logs AS (
    SELECT '{"key1":{"ItemID":1,"UseCount":4,"ItemCount":7},"key2":{"ItemID":2,"UseCount":5,"ItemCount":8},"key3":{"ItemID":3,"UseCount":6,"ItemCount":9}}' as json_string,
    UNION ALL SELECT '{"key4":{"ItemID":1,"UseCount":4,"ItemCount":7},"key5":{"ItemID":2,"UseCount":5,"ItemCount":8}}'
)
SELECT
    json_string,
    key,
    TO_JSON_STRING(value) as value,
FROM sample_logs
CROSS JOIN UNNEST(bqutil.fn.json_extract_keys(json_string)) as key WITH OFFSET
INNER JOIN UNNEST(bqutil.fn.json_extract_values(json_string)) as value WITH OFFSET USING (OFFSET)
;
Run Code Online (Sandbox Code Playgroud)
结果

在此输入图像描述

预期成绩
JSON_STRING1  |  "key1"   |  {"ItemID":1,"UseCount":4,"ItemCount":7}  -- <- not [object Object]
JSON_STRING1  | …
Run Code Online (Sandbox Code Playgroud)

google-bigquery bigquery-udf

3
推荐指数
1
解决办法
2246
查看次数

在 BigQuery 中查找重叠的时间段

假设我在 BigQuery 中的数据结构如下:

WITH session_log AS (
  SELECT 'ABC' as site_id, 1234 user_id, 12 session_id, '2020-02-10 00:29:59.376000 UTC' start_time, '2020-02-10 01:13:02.817000 UTC' end_time UNION ALL
  SELECT 'ABC' as site_id, 1234 user_id, 13 session_id, '2020-02-10 02:41:56.330000 UTC' start_time, '2020-02-10 02:41:56.389999 UTC' end_time UNION ALL
  SELECT 'ABC' as site_id, 1234 user_id, 14 session_id, '2020-02-10 04:24:46.649999 UTC' start_time, '2020-02-10 05:14:08.243000 UTC' end_time UNION ALL
  SELECT 'ABC' as site_id, 1234 user_id, 15 session_id, '2020-02-10 04:59:21.356999 UTC' start_time, '2020-02-10 15:57:11.501000 UTC' end_time  
  SELECT 'ABC' as …
Run Code Online (Sandbox Code Playgroud)

sql google-bigquery bigquery-udf

2
推荐指数
1
解决办法
1289
查看次数