小编Har*_*ari的帖子

使用Apache Beam以CSV格式将BigQuery结果写入GCS

我是Apache Beam的新手,我在其中尝试编写管道以从Google BigQuery提取数据,然后使用Python将数据以CSV格式写入GCS。

使用,beam.io.read(beam.io.BigQuerySource())我能够从BigQuery读取数据,但不确定如何将其以CSV格式写入GCS。

是否有实现相同功能的自定义功能,能否请您帮我吗?

import logging

import apache_beam as beam


PROJECT='project_id'
BUCKET='project_bucket'


def run():
    argv = [
        '--project={0}'.format(PROJECT),
        '--job_name=readwritebq',
        '--save_main_session',
        '--staging_location=gs://{0}/staging/'.format(BUCKET),
        '--temp_location=gs://{0}/staging/'.format(BUCKET),
        '--runner=DataflowRunner'
]

with beam.Pipeline(argv=argv) as p:

    # Execute the SQL in big query and store the result data set into given Destination big query table.
    BQ_SQL_TO_TABLE = p | 'read_bq_view' >> beam.io.Read(
        beam.io.BigQuerySource(query =  'Select * from `dataset.table`', use_standard_sql=True))
    # Extract data from Bigquery to GCS in CSV format.
    # This is where I need …
Run Code Online (Sandbox Code Playgroud)

python google-bigquery google-cloud-dataflow apache-beam

2
推荐指数
1
解决办法
924
查看次数