从Google BigQuery中的查询结果创建表格

Luk*_*kas 17 python google-app-engine google-bigquery

我们通过Python API 使用Google BigQuery.如何从查询结果中创建表(新表或覆盖旧表)?我查看了查询文档,但我发现它没有用.

我们想要模拟:

ANSI SQL中的"SELEC ... INTO ...".

Jor*_*ani 16

您可以通过在查询中指定目标表来执行此操作.您需要使用Jobs.insertAPI而不是Jobs.query调用,并且应指定writeDisposition=WRITE_APPEND并填写目标表.

如果您使用的是原始API,那么这就是配置的样子.如果您使用的是Python,那么Python客户端应该为这些相同的字段提供访问器:

"configuration": {
  "query": {
    "query": "select count(*) from foo.bar",
    "destinationTable": {
      "projectId": "my_project",
      "datasetId": "my_dataset",
      "tableId": "my_table"
    },
    "createDisposition": "CREATE_IF_NEEDED",
    "writeDisposition": "WRITE_APPEND",
  }
}
Run Code Online (Sandbox Code Playgroud)

  • 但是,如果你使用python API,你可能会通过客户端方法而不是原始JSON来设置配置. (2认同)
  • 您可以通过为查询指定目标表来获取完整查询结果(如果您使用Web UI,则可以在查询窗格中选择"启用选项")并设置allowLargeResults. (2认同)

log*_*ogc 15

接受的答案是正确的,但它不提供Python代码来执行任务.这是一个例子,重构自我刚写的一个小型自定义客户端类.它不处理异常,并且应该定制硬编码查询以执行比仅仅更有趣的事情SELECT *...

import time

from google.cloud import bigquery
from google.cloud.bigquery.table import Table
from google.cloud.bigquery.dataset import Dataset


class Client(object):

    def __init__(self, origin_project, origin_dataset, origin_table,
                 destination_dataset, destination_table):
        """
        A Client that performs a hardcoded SELECT and INSERTS the results in a
        user-specified location.

        All init args are strings. Note that the destination project is the
        default project from your Google Cloud configuration.
        """
        self.project = origin_project
        self.dataset = origin_dataset
        self.table = origin_table
        self.dest_dataset = destination_dataset
        self.dest_table_name = destination_table
        self.client = bigquery.Client()

    def run(self):
        query = ("SELECT * FROM `{project}.{dataset}.{table}`;".format(
            project=self.project, dataset=self.dataset, table=self.table))

        job_config = bigquery.QueryJobConfig()

        # Set configuration.query.destinationTable
        destination_dataset = self.client.dataset(self.dest_dataset)
        destination_table = destination_dataset.table(self.dest_table_name)
        job_config.destination = destination_table

        # Set configuration.query.createDisposition
        job_config.create_disposition = 'CREATE_IF_NEEDED'

        # Set configuration.query.writeDisposition
        job_config.write_disposition = 'WRITE_APPEND'

        # Start the query
        job = self.client.query(query, job_config=job_config)

        # Wait for the query to finish
        job.result()
Run Code Online (Sandbox Code Playgroud)