当我想使用 PyPI 中的特定库时,我在数据流管道中尝试了一个奇怪的错误。
我需要jsonschemaParDo,因此,在我的requirements.txt文件中,我添加了jsonschema==3.2.0. 我使用下面的命令行启动管道:
python -m gcs_to_all \
--runner DataflowRunner \
--project <my-project-id> \
--region europe-west1 \
--temp_location gs://<my-bucket-name>/temp/ \
--input_topic "projects/<my-project-id>/topics/<my-topic>" \
--network=<my-network> \
--subnetwork=<my-subnet> \
--requirements_file=requirements.txt \
--experiments=allow_non_updatable_job \
--streaming
Run Code Online (Sandbox Code Playgroud)
在终端中,一切似乎都很好:
INFO:root:2020-01-03T09:18:35.569Z: JOB_MESSAGE_BASIC: Worker configuration: n1-standard-4 in europe-west1-b.
INFO:root:2020-01-03T09:18:35.806Z: JOB_MESSAGE_WARNING: The network default doesn't have rules that open TCP ports 12345-12346 for internal connection with other VMs. Only rules with a target tag 'dataflow' or empty target tags set apply. If …Run Code Online (Sandbox Code Playgroud)