小编Mar*_*ino的帖子

使用 Python 在 Apache Beam 管道中进行异常处理

我正在使用 python 中的 Apache Beam(在 GCP Dataflow 上)做一个简单的管道,从 PubSub 读取并在 Big Query 上写入,但无法处理管道上的异常以创建替代流。

在一个简单的 WriteToBigQuery 示例中:

output = json_output | 'Write to BigQuery' >> beam.io.WriteToBigQuery('some-project:dataset.table_name')
Run Code Online (Sandbox Code Playgroud)

我试图把它放在一个try/except代码中,但它不起作用,因为当它失败时,异常似乎被抛出到我的 python 执行之外的 Java 层上:

INFO:root:2019-01-29T15:49:46.516Z: JOB_MESSAGE_ERROR: java.util.concurrent.ExecutionException: java.lang.RuntimeException: Error received from SDK harness for instruction -87: Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/apache_beam/runners/worker/sdk_worker.py", line 135, in _execute
    response = task()
  File "/usr/local/lib/python2.7/dist-packages/apache_beam/runners/worker/sdk_worker.py", line 170, in <lambda>
    self._execute(lambda: worker.do_instruction(work), work)
  File "/usr/local/lib/python2.7/dist-packages/apache_beam/runners/worker/sdk_worker.py", line 221, in do_instruction
    request.instruction_id)
...
...
...
    self.signature.finish_bundle_method.method_value())
  File "/usr/local/lib/python2.7/dist-packages/apache_beam/io/gcp/bigquery.py", …
Run Code Online (Sandbox Code Playgroud)

python dataflow google-cloud-dataflow apache-beam

5
推荐指数
1
解决办法
3777
查看次数