use*_*653 5 copy google-bigquery
我打算在同一个项目中将一组表从一个数据集复制到另一个数据集。我在 Ipython notebook 中执行代码。
我使用以下代码获取要在变量“value”中复制的表名列表:
list = bq.DataSet('test:TestDataset')
for x in list.tables():
if(re.match('table1(.*)',x.name.table_id)):
value = 'test:TestDataset.'+ x.name.table_id
Run Code Online (Sandbox Code Playgroud)
然后我尝试使用“bq cp”命令将表从一个数据集复制到另一个数据集。但是我无法在笔记本中执行 bq 命令。
!bq cp $value proj1:test1.table1_20162020
Run Code Online (Sandbox Code Playgroud)
笔记:
我尝试使用 bigquery 命令来检查是否有与之关联的复制命令,但找不到任何命令。
我创建了以下脚本来将所有表从一个数据集复制到另一个数据集,并进行一些验证。
from google.cloud import bigquery
client = bigquery.Client()
projectFrom = 'source_project_id'
datasetFrom = 'source_dataset'
projectTo = 'destination_project_id'
datasetTo = 'destination_dataset'
# Creating dataset reference from google bigquery cient
dataset_from = client.dataset(dataset_id=datasetFrom, project=projectFrom)
dataset_to = client.dataset(dataset_id=datasetTo, project=projectTo)
for source_table_ref in client.list_dataset_tables(dataset=dataset_from):
# Destination table reference
destination_table_ref = dataset_to.table(source_table_ref.table_id)
job = client.copy_table(
source_table_ref,
destination_table_ref)
job.result()
assert job.state == 'DONE'
dest_table = client.get_table(destination_table_ref)
source_table = client.get_table(source_table_ref)
assert dest_table.num_rows > 0 # validation 1
assert dest_table.num_rows == source_table.num_rows # validation 2
print ("Source - table: {} row count {}".format(source_table.table_id,source_table.num_rows ))
print ("Destination - table: {} row count {}".format(dest_table.table_id, dest_table.num_rows))
Run Code Online (Sandbox Code Playgroud)
如果您将 BigQuery API 与 Python 结合使用,则可以运行复制作业:
https://cloud.google.com/bigquery/docs/tables#copyingtable
从文档中复制 Python 示例:
def copyTable(service):
try:
sourceProjectId = raw_input("What is your source project? ")
sourceDatasetId = raw_input("What is your source dataset? ")
sourceTableId = raw_input("What is your source table? ")
targetProjectId = raw_input("What is your target project? ")
targetDatasetId = raw_input("What is your target dataset? ")
targetTableId = raw_input("What is your target table? ")
jobCollection = service.jobs()
jobData = {
"projectId": sourceProjectId,
"configuration": {
"copy": {
"sourceTable": {
"projectId": sourceProjectId,
"datasetId": sourceDatasetId,
"tableId": sourceTableId,
},
"destinationTable": {
"projectId": targetProjectId,
"datasetId": targetDatasetId,
"tableId": targetTableId,
},
"createDisposition": "CREATE_IF_NEEDED",
"writeDisposition": "WRITE_TRUNCATE"
}
}
}
insertResponse = jobCollection.insert(projectId=targetProjectId, body=jobData).execute()
# Ping for status until it is done, with a short pause between calls.
import time
while True:
status = jobCollection.get(projectId=targetProjectId,
jobId=insertResponse['jobReference']['jobId']).execute()
if 'DONE' == status['status']['state']:
break
print 'Waiting for the import to complete...'
time.sleep(10)
if 'errors' in status['status']:
print 'Error loading table: ', pprint.pprint(status)
return
print 'Loaded the table:' , pprint.pprint(status)#!!!!!!!!!!
# Now query and print out the generated results table.
queryTableData(service, targetProjectId, targetDatasetId, targetTableId)
except HttpError as err:
print 'Error in loadTable: ', pprint.pprint(err.resp)
Run Code Online (Sandbox Code Playgroud)
该bq cp命令在内部基本相同(您也可以调用该函数,具体取决于bq您要导入的内容)。
假设您要复制大多数表,可以先复制整个 BigQuery 数据集,然后删除一些不想复制的表。
复制数据集 UI 与复制表类似。只需从源数据集中单击“复制数据集”按钮,然后在弹出的表单中指定目标数据集。您可以将数据集复制到另一个项目或另一个区域。请参阅下面如何复制数据集的屏幕截图。
复制数据集按钮
复制数据集表格
| 归档时间: |
|
| 查看次数: |
18188 次 |
| 最近记录: |