CHI*_*HID 2 presto unnest amazon-athena
我的问题有点类似于这个(Athena/Presto - UNNEST MAP to columns)。但就我而言,我事先知道我需要哪些列。
我的用例是这样的
我有一个 json blob,其中包含以下结构
{
  "reqId" : "1234",
  "clientId" : "client",
  "response" : [
                 {
                   "name" : "Susan",
                   "projects" : [
                       {
                          "name" : "project1",
                          "completed" : true
                       },
                       {
                          "name" : "project2",
                          "completed" : false
                       }
                   ]
                 },
                 {
                   "name" : "Adams",
                   "projects" : [
                       {
                          "name" : "project1",
                          "completed" : true
                       },
                       {
                          "name" : "project2",
                          "completed" : false
                       }
                   ]
                 }
               ]
}
我需要创建一个视图,它将返回类似这样的输出
    name  |  project    |  Completed |
----------+-------------+------------+
    Susan |  project1   |   true     |
    Susan |  project2   |   false    |
    Adams |  project1   |   true     |
    Adams |  project2   |   false    |
我尝试了以下方法和其他方法。这是我能得到的最接近的
WITH dataset AS (
  SELECT 'Susan' as name, transform(filter(CAST(json_extract('{
           "projects": [{"name":"project1", "completed":false}, {"name":"project3", "completed":false},
           {"name":"project2", "completed":true}]}', '$.projects') AS ARRAY<MAP<VARCHAR, VARCHAR>>), p -> (p['name'] != 'project1')), p -> ROW(map_values(p))) AS projects
)
SELECT * from dataset
CROSS JOIN UNNEST(projects)
这是我得到的输出
    name    projects                                                        _col2
1   Susan   [{field0=[project3, false]}, {field0=[project2, true]}] {field0=[project3, false]}
2   Susan   [{field0=[project3, false]}, {field0=[project2, true]}] {field0=[project2, true]}
我基本上想将映射的键值对解除嵌套为单独的列。我如何在 presto / Athena 中执行此操作?
您的 JSON 示例似乎无效,它缺少,after"name" : "Susan"和"name" : "Adams"。除此之外,您可以通过此查询实现预期的输出,您需要 UNNEST 两次,并且还需要一些转换:
with dataset as
(
    select json_parse('{"reqId" : "1234","clientId" : "client","response" : [{"name" : "Susan","projects" : [{"name" : "project1","completed" : true},{"name" : "project2","completed" : false}]},{"name" : "Adams","projects" : [{"name" : "project1","completed" : true},{"name" : "project2","completed" : false}]}]}') as json_col
)
,unnest_response as
(
    select * 
    from dataset
    cross join UNNEST(cast(json_extract(json_col, '$.response') as array<JSON>)) as t (response)
)
select 
json_extract_scalar(response, '$.name') name,
json_extract_scalar(project, '$.name') project_name,
json_extract_scalar(project, '$.completed') project_completed
from unnest_response
cross join UNNEST(cast(json_extract(response, '$.projects') as array<JSON>)) as t (project);
| 归档时间: | 
 | 
| 查看次数: | 4636 次 | 
| 最近记录: |