如何将函数应用于数组列的每个元素?

ece*_*ulm 5 snowflake-cloud-data-platform

我有一个数据集,其中有一列包含对象数组,如下所示:

ID   TAGS
1    {"tags": [{"tag": "a"}, {"tag": "b"}]}
2    {"tags": [{"tag": "c"}, {"tag": "d"}]}
Run Code Online (Sandbox Code Playgroud)

我想提取tag数组每个元素的字段,所以最终结果是:

ID   TAGS
1    ["a","b"]
2    ["c","d"]
Run Code Online (Sandbox Code Playgroud)

假设如下表t1

CREATE OR REPLACE TEMPORARY TABLE t1 AS (
      select 1 as ID , PARSE_JSON('{"tags": [{"tag":"a"}, {"tag":"b"}]}') AS PAYLOAD
    UNION ALL
    select 2, PARSE_JSON('{"tags": [{"tag":"c"}, {"tag":"d"}]}')

);
Run Code Online (Sandbox Code Playgroud)

ece*_*ulm 6

纯 SQL 方法是将LATERAL FLATTENARRAY_AGG结合起来,如下所示:

with t2 as (
    select ID, t2.value:tag as tag
    from t1, LATERAL FLATTEN(input => payload:tags) t2
)
select t2.id, ARRAY_AGG(t2.tag) as tags from t2
group by ID 
order by ID ASC;
Run Code Online (Sandbox Code Playgroud)

t2 本身将变成:

ID  TAG
1   "a"
1   "b"
2   "c"
2   "d"
Run Code Online (Sandbox Code Playgroud)

之后就GROUP BY ID变成:

ID  TAGS
1   [    "a",    "b"  ]
2   [    "c",    "d"  ]
Run Code Online (Sandbox Code Playgroud)