在Biqquery中,如何使用标准Sql在Struct中匹配多个字段时过滤Struct数组?

FZF*_*FZF 1 google-bigquery

这是表(load_history)的记录布局,我正在尝试在使用标准Sql时使用过滤器(因为旧版sql有时可能会过时):

[
{
    "mode": "NULLABLE",
    "name": "Job",
    "type": "RECORD",
    "fields": [
        {
          "mode": "NULLABLE",
          "name": "name",
          "type": "STRING"
        },
        {
          "mode": "NULLABLE",
          "name": "start_time",
          "type": "TIMESTAMP"
        },
        {
          "mode": "NULLABLE",
          "name": "end_time",
          "type": "TIMESTAMP"
        },
        {
    ]
},      
{
    "mode": "REPEATED",
    "name": "source",
    "type": "RECORD",
    "description": "source tables touched by this job",
    "fields": [     
        {
          "mode": "NULLABLE",
          "name": "database",
          "type": "STRING"
        },
        {
          "mode": "NULLABLE",
          "name": "schema",
          "type": "STRING"
        },
        {
          "mode": "NULLABLE",
          "name": "table",
          "type": "STRING"
        },
        {
          "mode": "NULLABLE",
          "name": "partition_time",
          "type": "TIMESTAMP"
        }    
    ]
}
]      
Run Code Online (Sandbox Code Playgroud)

我只需要过滤和选择记录,在“源”数组中有一个条目,其“模式”和“表”字段匹配某些值(例如,同一数组条目中的schema ='log'AND table ='customer') 。

仅在结构(模式名称)中的一个字段上进行过滤时,以下工作:

select name, array(select x from unnest(schema) as x where x ='log' ), table
from (select job.name , array(select schema from unnest(source)) as schema, 
      array(select table from unnest(source)) as table
      from  config.load_history)
Run Code Online (Sandbox Code Playgroud)

但是,我无法使其对同一数组条目中的字段组合进行过滤。

感谢您的帮助

Mik*_*ant 5

适用于BigQuery标准SQL

#standardSQL
SELECT data
FROM data, UNNEST(source) AS s
WHERE (s.schema, s.table) = ('log', 'customer')  
Run Code Online (Sandbox Code Playgroud)

要么

#standardSQL
SELECT *
FROM data
WHERE EXISTS (
  SELECT 1 FROM UNNEST(source) AS s 
  WHERE (s.schema, s.table) = ('log', 'customer')
)
Run Code Online (Sandbox Code Playgroud)

您可以使用以下虚拟数据进行测试/播放

#standardSQL
WITH data AS (
  SELECT 
    STRUCT<name STRING, start_time INT64, end_time INT64>('jobA', 1, 2) AS job,
    [STRUCT<database STRING, schema STRING, table STRING, partition_time INT64>
      ('d1', 's1', 't1', 1), 
      ('d1', 's2', 't2', 2), 
      ('d1', 's3', 't3', 3) 
    ] AS source UNION ALL
  SELECT 
    STRUCT<name STRING, start_time INT64, end_time INT64>('jobB', 1, 2) AS job,
    [STRUCT<database STRING, schema STRING, table STRING, partition_time INT64>
      ('d1', 's1', 't1', 1), 
      ('d2', 's4', 't2', 2), 
      ('d2', 's3', 't3', 3) 
    ] AS source 
)
SELECT *
FROM data
WHERE EXISTS (
  SELECT 1 FROM UNNEST(source) AS s 
  WHERE (s.schema, s.table) = ('s2', 't2')
)
Run Code Online (Sandbox Code Playgroud)