xia*_*ong 6 sql json user-defined-functions google-bigquery
所以我有一个包含两列的原始表:
id (INT64) | content (STRING)
------------|--------------------
1 | {"photos": [{"location": {"lat": 111, "lon": 222}, "ts": "2019-12-16", "uri": "aaa"}, {"location": {"lat": 333, "lon": 444}, "ts": "2019-12-17", "uri": "bbb"}]}
------------|--------------------
2 | ....
Run Code Online (Sandbox Code Playgroud)
第一列是整数类型的 id,第二列是 json 格式的字符串。示例 json 如下所示:
{
"photos": [
{
"location": {
"lat": 111,
"lon": 222
},
"ts": "2019-12-16",
"uri": "aaa"
},
{
"location": {
"lat": 333,
"lon": 444
},
"ts": "2019-12-17",
"uri": "bbb"
}
]
}
Run Code Online (Sandbox Code Playgroud)
如何将原始表中的照片格式化为结构/记录数组,即产生类似的结果?
id | photos.ts | photos.uri | photos.location.lat | photos.location.lon
-------|---------------|-------------|-----------------------|--------------------
1 | 2019-12-16 | aaa | 111 | 222
| 2019-12-17 | bbb | 333 | 444
-------|---------------|-------------|-----------------------|--------------------
2 | ... | ... | ... | ...
Run Code Online (Sandbox Code Playgroud)
JSON_EXTRACT(content, "$.photos")
似乎是一个好的开始,因为它会给我一个 JSON 对象数组,然后我需要一些 JS UDF 将结果格式化为 BQ STRUCT
/RECORD
类型。但不确定具体如何做到这一点——感谢任何帮助!STRUCT
/的这种“清理”RECORD
是否真的有必要或值得。看来我可以将照片格式化为数组STRING
:id (INT64) | photos (STRING)
------------|--------------------
1 | {"location": {"lat": 111, "lon": 222}, "ts": "2019-12-16", "uri": "aaa"}
| {"location": {"lat": 333, "lon": 444}, "ts": "2019-12-17", "uri": "bbb"}
------------|--------------------
2 | ....
Run Code Online (Sandbox Code Playgroud)
JSON_EXTRACT/JSON_EXTRACT_SCALAR
,然后在我的分析查询中使用。我预计会有多大的性能牺牲?
谢谢!
Mik*_*ant 12
以下示例适用于 BigQuery 标准 SQL
#standardSQL
CREATE TEMP FUNCTION json2array(json STRING)
RETURNS ARRAY<STRING>
LANGUAGE js AS """
return JSON.parse(json).map(x=>JSON.stringify(x));
""";
WITH `project.dataset.table` AS (
SELECT 1 id, '{"photos": [{"location": {"lat": 111, "lon": 222}, "ts": "2019-12-16", "uri": "aaa"}, {"location": {"lat": 333, "lon": 444}, "ts": "2019-12-17", "uri": "bbb"}]}' content
)
SELECT id, json2array(JSON_EXTRACT(content, "$.photos")) AS photos
FROM `project.dataset.table`
Run Code Online (Sandbox Code Playgroud)
带输出
Row id photos
1 1 {"location":{"lat":111,"lon":222},"ts":"2019-12-16","uri":"aaa"}
{"location":{"lat":333,"lon":444},"ts":"2019-12-17","uri":"bbb"}
Run Code Online (Sandbox Code Playgroud)
或者...您可以进一步了解以下内容
#standardSQL
CREATE TEMP FUNCTION json2array(json STRING)
RETURNS ARRAY<STRING>
LANGUAGE js AS """
return JSON.parse(json).map(x=>JSON.stringify(x));
""";
WITH `project.dataset.table` AS (
SELECT 1 id, '{"photos": [{"location": {"lat": 111, "lon": 222}, "ts": "2019-12-16", "uri": "aaa"}, {"location": {"lat": 333, "lon": 444}, "ts": "2019-12-17", "uri": "bbb"}]}' content
)
SELECT id,
array(
SELECT AS struct
JSON_EXTRACT_SCALAR(photo, "$.ts") ts,
JSON_EXTRACT_SCALAR(photo, "$.uri") uri,
STRUCT(JSON_EXTRACT(photo, "$.location.lat") AS lat, JSON_EXTRACT(photo, "$.location.lon") AS lon) AS location
FROM unnest(json2array(JSON_EXTRACT(content, "$.photos"))) photo
) AS photos
FROM `project.dataset.table`
Run Code Online (Sandbox Code Playgroud)
返回
Row id photos.ts photos.uri photos.location.lat photos.location.lon
1 1 2019-12-16 aaa 111 222
2019-12-17 bbb 333 444
Run Code Online (Sandbox Code Playgroud)
归档时间: |
|
查看次数: |
10511 次 |
最近记录: |