通过间接架构更改更新 BigQuery 视图

de1*_*de1 6 python google-bigquery

更新视图时,似乎不会拾取间接架构更改。

重现步骤

  • view1使用字段创建field1(例如SELECT 1 AS field1
  • 创建view2选择所有字段view1
  • 更新view1还包括field2(例如SELECT 1 AS field1, 2 AS field2
  • 使用与之前相同的查询进行更新view2(由于记录的限制

期望的结果

  • view1view2包括field1和的架构field2
  • 视图更新应该是原子的

实际结果

  • 正确更新的架构view1(包括field1field2
  • 架构view2仅包括field1
  • 选择 fromview2实际上会返回field1并且field2

我可以删除view2并重新创建它,但这不是原子的,并且有时视图不可用,这是不希望的。

我还尝试更新 的 schema 属性,view2但被拒绝Cannot add fields (field: field2)

google.api_core.exceptions.BadRequest:400 PATCH https://www.googleapis.com/bigquery/v2/projects/ <project-id>/datasets/dataset1/tables/view2:提供的架构与表 <project-id 不匹配>:数据集1.视图2。无法添加字段(字段:field2)

问题

有没有办法以原子方式更新视图,同时更新间接更改的架构(视图从中选择的表/视图)。

注意:当然,我的 view2 会添加其他字段,并且我当前可以通过创建新的临时视图来确定其架构。

注意:架构很重要,因为 Data Studio 的 BigQuery 连接器等工具正在检查架构。

重现步骤的代码

# Python 3.6+
import google.api_core.exceptions
from google.cloud import bigquery


def delete_table_if_exists(client: bigquery.Client, table: bigquery.Table):
    try:
        client.delete_table(table)
    except google.api_core.exceptions.NotFound:
        pass


def full_table_id(table: bigquery.Table) -> str:
    # Note: the documentation says it should be separated by a dot but uses a colon
    return table.full_table_id.replace(':', '.')


def view_test():
    client = bigquery.Client()

    dataset_ref = client.dataset('dataset1')
    try:
        client.create_dataset(dataset_ref)
    except google.api_core.exceptions.Conflict:
        pass

    view1 = bigquery.Table(dataset_ref.table('view1'))
    view2 = bigquery.Table(dataset_ref.table('view2'))
    delete_table_if_exists(client, view1)
    delete_table_if_exists(client, view2)

    view1.view_query = 'SELECT 1 AS field1'
    view1 = client.create_table(view1)

    view2.view_query = f'SELECT * FROM `{full_table_id(view1)}`'
    client.create_table(view2)

    view1.view_query = 'SELECT 1 AS field1, 2 AS field2'
    client.update_table(view1, ['view_query'])

    client.update_table(view2, ['view_query'])
    print('view2 schema:', client.get_table(view2).schema)

    # trying to update the schema fails with 'Cannot add fields (field: field2)'
    view2.schema = client.get_table(view1).schema
    client.update_table(view2, ['schema'])


if __name__ == '__main__':
    view_test()
Run Code Online (Sandbox Code Playgroud)

Bash 示例执行相同操作

#!/bin/bash

set -e

project_id=$(gcloud config list --format 'value(core.project)' 2>/dev/null)

bq mk -f dataset1

bq rm -f dataset1.view1
bq rm -f dataset1.view2

bq mk --use_legacy_sql=false --view 'SELECT 1 AS field1' dataset1.view1
bq mk --use_legacy_sql=false --view 'SELECT * FROM `'$project_id'.dataset1.view1`' dataset1.view2

bq update --use_legacy_sql=false --view 'SELECT 1 AS field1, 2 AS field2' dataset1.view1
bq update --use_legacy_sql=false --view 'SELECT * FROM `'$project_id'.dataset1.view1`' dataset1.view2

bq show dataset1.view2
Run Code Online (Sandbox Code Playgroud)

更新:带有已接受答案的代码

Python代码

#!/bin/bash

set -e

project_id=$(gcloud config list --format 'value(core.project)' 2>/dev/null)

bq mk -f dataset1

bq rm -f dataset1.view1
bq rm -f dataset1.view2

bq mk --use_legacy_sql=false --view 'SELECT 1 AS field1' dataset1.view1
bq mk --use_legacy_sql=false --view 'SELECT * FROM `'$project_id'.dataset1.view1`' dataset1.view2

bq update --use_legacy_sql=false --view 'SELECT 1 AS field1, 2 AS field2' dataset1.view1
bq update --use_legacy_sql=false --view 'SELECT * FROM `'$project_id'.dataset1.view1`' dataset1.view2

bq show dataset1.view2
Run Code Online (Sandbox Code Playgroud)

猛击魔法

bq query --use_legacy_sql=false 'CREATE OR REPLACE VIEW dataset1.view2 AS SELECT * FROM `'$project_id'.dataset1.view1`'
Run Code Online (Sandbox Code Playgroud)

Ell*_*ard 7

你应该使用一个CREATE OR REPLACE VIEW声明;请参阅相关文档。BigQuery 为所有执行表修改的查询提供 ACID 语义,也不CREATE OR REPLACE VIEW例外,因此这会自动替换视图的定义和架构。