从python + flask + gunicorn + nginx + Compute Engine应用程序中读取从Google Cloud Storage下载的文件失败.链接到代码:https://github.com/samuq/CE-test.文件'ETL_SHP_READ_SQL_WRITE'的第64行没有返回任何内容,尽管该文件有效并且其中包含数据:
prj_blob.download_to_file(self.prj_file)
logger.log_text(self.prj_file)
line 64 --> euref_fin.ImportFromWkt(self.prj_file.read())).
Run Code Online (Sandbox Code Playgroud) python -m main \ --setup_file setup.py \ --runner DataflowRunner \ --project my-test \ --staging_location gs://my-test/staging \ --temp_location gs://my-test/temp \ --template_location gs://my-test/templates/test --output gs://my-test/output
Run Code Online (Sandbox Code Playgroud)
上面的命令只在本地运行(本地安装的需要依赖项)并且不创建模板。这是 main.py 中的管道选项:
pipeline_options = {
'project': 'my-test',
'staging_location': 'gs://my-test/staging',
'runner': 'DataflowRunner',
'job_name': 'test',
'temp_location': 'gs://my-test/temp',
'save_main_session': True,
'setup_file':'setup.py',
'output': 'gs://my-test/output',
'template_location': 'gs://my-test/templates/test'
}
options = PipelineOptions.from_dictionary(pipeline_options)
with beam.Pipeline(options=options) as p:
Run Code Online (Sandbox Code Playgroud)
这是 setup.py:
import subprocess
import setuptools
from setuptools.command.bdist_egg import bdist_egg as _bdist_egg
class bdist_egg(_bdist_egg):
def run(self):
self.run_command('CustomCommands')
_bdist_egg.run(self)
CUSTOM_COMMANDS = [
['apt-get', 'update'],
['apt-get', '--assume-yes', 'install', …Run Code Online (Sandbox Code Playgroud) TypeError: 'PCollection' object does not support indexing
Run Code Online (Sandbox Code Playgroud)
上述错误是由于尝试将 Pcollection 转换为列表而导致的:
filesList = (files | beam.combiners.ToList())
lines = (p | 'read' >> beam.Create(ReadSHP().ReadSHP(filesList))
| 'map' >> beam.Map(_to_dictionary))
Run Code Online (Sandbox Code Playgroud)
和:
def ReadSHP(self, filesList):
"""
"""
sf = shp.Reader(shp=filesList[1], dbf=filesList[2])
Run Code Online (Sandbox Code Playgroud)
如何解决这个问题?任何帮助表示赞赏。