我的 Apache 光束管道实现了自定义转换和 ParDo 的 python 模块,它们进一步导入了我编写的其他模块。在本地运行器上,这工作正常,因为所有可用文件都在同一路径中。在 Dataflow 运行器的情况下,管道因模块导入错误而失败。
如何使所有数据流工作人员都可以使用自定义模块?请指教。
下面是一个例子:
ImportError: No module named DataAggregation
at find_class (/usr/lib/python2.7/pickle.py:1130)
at find_class (/usr/local/lib/python2.7/dist-packages/dill/dill.py:423)
at load_global (/usr/lib/python2.7/pickle.py:1096)
at load (/usr/lib/python2.7/pickle.py:864)
at load (/usr/local/lib/python2.7/dist-packages/dill/dill.py:266)
at loads (/usr/local/lib/python2.7/dist-packages/dill/dill.py:277)
at loads (/usr/local/lib/python2.7/dist-packages/apache_beam/internal/pickler.py:232)
at apache_beam.runners.worker.operations.PGBKCVOperation.__init__ (operations.py:508)
at apache_beam.runners.worker.operations.create_pgbk_op (operations.py:452)
at apache_beam.runners.worker.operations.create_operation (operations.py:613)
at create_operation (/usr/local/lib/python2.7/dist-packages/dataflow_worker/executor.py:104)
at execute (/usr/local/lib/python2.7/dist-packages/dataflow_worker/executor.py:130)
at do_work (/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py:642)
Run Code Online (Sandbox Code Playgroud)