在谷歌大查询中将表从一个数据集复制到另一个数据集

use*_*653 5 copy google-bigquery

我打算在同一个项目中将一组表从一个数据集复制到另一个数据集。我在 Ipython notebook 中执行代码。

我使用以下代码获取要在变量“value”中复制的表名列表:

list = bq.DataSet('test:TestDataset')

for x in list.tables():
   if(re.match('table1(.*)',x.name.table_id)):
     value = 'test:TestDataset.'+ x.name.table_id
Run Code Online (Sandbox Code Playgroud)

然后我尝试使用“bq cp”命令将表从一个数据集复制到另一个数据集。但是我无法在笔记本中执行 bq 命令。

!bq cp $value proj1:test1.table1_20162020
Run Code Online (Sandbox Code Playgroud)

笔记:

我尝试使用 bigquery 命令来检查是否有与之关联的复制命令,但找不到任何命令。

MJK*_*MJK 7

我创建了以下脚本来将所有表从一个数据集复制到另一个数据集,并进行一些验证。

from google.cloud import bigquery

client = bigquery.Client()

projectFrom = 'source_project_id'
datasetFrom = 'source_dataset'

projectTo = 'destination_project_id'
datasetTo = 'destination_dataset'

# Creating dataset reference from google bigquery cient
dataset_from = client.dataset(dataset_id=datasetFrom, project=projectFrom)
dataset_to = client.dataset(dataset_id=datasetTo, project=projectTo)

for source_table_ref in client.list_dataset_tables(dataset=dataset_from):
    # Destination table reference
    destination_table_ref = dataset_to.table(source_table_ref.table_id)

    job = client.copy_table(
      source_table_ref,
      destination_table_ref)

    job.result()
    assert job.state == 'DONE'

    dest_table = client.get_table(destination_table_ref)
    source_table = client.get_table(source_table_ref)

    assert dest_table.num_rows > 0 # validation 1  
    assert dest_table.num_rows == source_table.num_rows # validation 2

    print ("Source - table: {} row count {}".format(source_table.table_id,source_table.num_rows ))
    print ("Destination - table: {} row count {}".format(dest_table.table_id, dest_table.num_rows))
Run Code Online (Sandbox Code Playgroud)


Fel*_*ffa 5

如果您将 Bi​​gQuery API 与 Python 结合使用,则可以运行复制作业:

https://cloud.google.com/bigquery/docs/tables#copyingtable

从文档中复制 Python 示例:

def copyTable(service):
   try:
    sourceProjectId = raw_input("What is your source project? ")
    sourceDatasetId = raw_input("What is your source dataset? ")
    sourceTableId = raw_input("What is your source table? ")

    targetProjectId = raw_input("What is your target project? ")
    targetDatasetId = raw_input("What is your target dataset? ")
    targetTableId = raw_input("What is your target table? ")

    jobCollection = service.jobs()
    jobData = {
      "projectId": sourceProjectId,
      "configuration": {
          "copy": {
              "sourceTable": {
                  "projectId": sourceProjectId,
                  "datasetId": sourceDatasetId,
                  "tableId": sourceTableId,
              },
              "destinationTable": {
                  "projectId": targetProjectId,
                  "datasetId": targetDatasetId,
                  "tableId": targetTableId,
              },
          "createDisposition": "CREATE_IF_NEEDED",
          "writeDisposition": "WRITE_TRUNCATE"
          }
        }
      }

    insertResponse = jobCollection.insert(projectId=targetProjectId, body=jobData).execute()

    # Ping for status until it is done, with a short pause between calls.
    import time
    while True:
      status = jobCollection.get(projectId=targetProjectId,
                                 jobId=insertResponse['jobReference']['jobId']).execute()
      if 'DONE' == status['status']['state']:
          break
      print 'Waiting for the import to complete...'
      time.sleep(10)

    if 'errors' in status['status']:
      print 'Error loading table: ', pprint.pprint(status)
      return

    print 'Loaded the table:' , pprint.pprint(status)#!!!!!!!!!!

    # Now query and print out the generated results table.
    queryTableData(service, targetProjectId, targetDatasetId, targetTableId)

   except HttpError as err:
    print 'Error in loadTable: ', pprint.pprint(err.resp)
Run Code Online (Sandbox Code Playgroud)

bq cp命令在内部基本相同(您也可以调用该函数,具体取决于bq您要导入的内容)。


Jia*_* He 5

假设您要复制大多数表,可以先复制整个 BigQuery 数据集,然后删除一些不想复制的表。

复制数据集 UI 与复制表类似。只需从源数据集中单击“复制数据集”按钮,然后在弹出的表单中指定目标数据集。您可以将数据集复制到另一个项目或另一个区域。请参阅下面如何复制数据集的屏幕截图。

复制数据集按钮

在此输入图像描述

复制数据集表格

在此输入图像描述