我们有一个默认的 VPC。尝试运行数据流作业。初始步骤(读取文件)设法处理 1/2 步骤。获取JOB_MESSAGE_ERROR: SDK harness sdk-0-0 disconnected错误消息,但日志中没有其他内容。已尝试设置角色和 vpc 防火墙规则。
我想使用 Geobeam 图像 (Apache Beam Python 3.9 SDK 2.41.0) 运行数据流作业。我对工作的定义如下:
def run(pipeline_args, known_args):
import apache_beam as beam
from apache_beam.io.gcp.internal.clients import storage
from apache_beam.options.pipeline_options import PipelineOptions
from geobeam.io import GeoJSONSource, filebasedsource
from geobeam.fn import format_record, make_valid, filter_invalid
pipeline_options = PipelineOptions([
] + pipeline_args)
with beam.Pipeline(options=pipeline_options) as p:
(p
| beam.io.Read(GeoJSONSource(known_args.gcs_url, encoding='utf-8'))
| 'FilterCords' >> beam.Filter(lambda x: len(x[-1]["coordinates"]) > 1)
| 'MakeValid' >> beam.Map(make_valid)
| 'FilterInvalid' >> beam.Filter(filter_invalid) …Run Code Online (Sandbox Code Playgroud)