Adi*_*xit 6 python google-cloud-dataflow gcp
有人可以分享在用python编写的GCP Dataflow管道中读/写bigquery表的语法吗
在数据流上运行
首先,Pipeline使用以下选项构造一个以使其在 GCP DataFlow 上运行:
import apache_beam as beam
options = {'project': <project>,
'runner': 'DataflowRunner',
'region': <region>,
'setup_file': <setup.py file>}
pipeline_options = beam.pipeline.PipelineOptions(flags=[], **options)
pipeline = beam.Pipeline(options = pipeline_options)
Run Code Online (Sandbox Code Playgroud)
从 BigQuery 读取
BigQuerySource使用您的查询定义 a并用于beam.io.Read从 BQ 读取数据:
BQ_source = beam.io.BigQuerySource(query = <query>)
BQ_data = pipeline | beam.io.Read(BQ_source)
Run Code Online (Sandbox Code Playgroud)
写入 BigQuery
有两种写入 bigquery 的选项:
使用BigQuerySink和beam.io.Write:
BQ_sink = beam.io.BigQuerySink(<table>, dataset=<dataset>, project=<project>)
BQ_data | beam.io.Write(BQ_sink)
Run Code Online (Sandbox Code Playgroud)使用beam.io.WriteToBigQuery:
BQ_data | beam.io.WriteToBigQuery(<table>, dataset=<dataset>, project=<project>)
Run Code Online (Sandbox Code Playgroud)| 归档时间: |
|
| 查看次数: |
3977 次 |
| 最近记录: |