Oli*_*ich 5 java google-cloud-dataflow apache-beam apache-beam-io
我正在尝试创建一个谷歌数据流模板,但我似乎找不到一种方法来做到这一点而不产生以下异常:
WARNING: Size estimation of the source failed: RuntimeValueProvider{propertyName=inputFile, default=null}
java.lang.IllegalStateException: Value only available at runtime, but accessed from a non-runtime context: RuntimeValueProvider{propertyName=inputFile, default=null}
at org.apache.beam.sdk.options.ValueProvider$RuntimeValueProvider.get(ValueProvider.java:234)
at org.apache.beam.sdk.io.FileBasedSource.getEstimatedSizeBytes(FileBasedSource.java:218)
at org.apache.beam.runners.dataflow.internal.CustomSources.serializeToCloudSource(CustomSources.java:78)
at org.apache.beam.runners.dataflow.ReadTranslator.translateReadHelper(ReadTranslator.java:53)
at org.apache.beam.runners.dataflow.ReadTranslator.translate(ReadTranslator.java:40)
at org.apache.beam.runners.dataflow.ReadTranslator.translate(ReadTranslator.java:37)
at org.apache.beam.runners.dataflow.DataflowPipelineTranslator$Translator.visitPrimitiveTransform(DataflowPipelineTranslator.java:453)
at org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:668)
at org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:660)
at org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:660)
at org.apache.beam.sdk.runners.TransformHierarchy$Node.access$600(TransformHierarchy.java:311)
at org.apache.beam.sdk.runners.TransformHierarchy.visit(TransformHierarchy.java:245)
at org.apache.beam.sdk.Pipeline.traverseTopologically(Pipeline.java:458)
at org.apache.beam.runners.dataflow.DataflowPipelineTranslator$Translator.translate(DataflowPipelineTranslator.java:392)
at org.apache.beam.runners.dataflow.DataflowPipelineTranslator.translate(DataflowPipelineTranslator.java:170)
at org.apache.beam.runners.dataflow.DataflowRunner.run(DataflowRunner.java:680)
at org.apache.beam.runners.dataflow.DataflowRunner.run(DataflowRunner.java:174)
at org.apache.beam.sdk.Pipeline.run(Pipeline.java:311)
at org.apache.beam.sdk.Pipeline.run(Pipeline.java:297)
at org.apache.beam.examples.MyMinimalWordCount.main(MyMinimalWordCount.java:69)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.codehaus.mojo.exec.ExecJavaMojo$1.run(ExecJavaMojo.java:282)
at java.lang.Thread.run(Thread.java:748)
Run Code Online (Sandbox Code Playgroud)
我可以使用 Beam 的 MinimalWordCount 示例的简单修改版本来重现它。
WARNING: Size estimation of the source failed: RuntimeValueProvider{propertyName=inputFile, default=null}
java.lang.IllegalStateException: Value only available at runtime, but accessed from a non-runtime context: RuntimeValueProvider{propertyName=inputFile, default=null}
at org.apache.beam.sdk.options.ValueProvider$RuntimeValueProvider.get(ValueProvider.java:234)
at org.apache.beam.sdk.io.FileBasedSource.getEstimatedSizeBytes(FileBasedSource.java:218)
at org.apache.beam.runners.dataflow.internal.CustomSources.serializeToCloudSource(CustomSources.java:78)
at org.apache.beam.runners.dataflow.ReadTranslator.translateReadHelper(ReadTranslator.java:53)
at org.apache.beam.runners.dataflow.ReadTranslator.translate(ReadTranslator.java:40)
at org.apache.beam.runners.dataflow.ReadTranslator.translate(ReadTranslator.java:37)
at org.apache.beam.runners.dataflow.DataflowPipelineTranslator$Translator.visitPrimitiveTransform(DataflowPipelineTranslator.java:453)
at org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:668)
at org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:660)
at org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:660)
at org.apache.beam.sdk.runners.TransformHierarchy$Node.access$600(TransformHierarchy.java:311)
at org.apache.beam.sdk.runners.TransformHierarchy.visit(TransformHierarchy.java:245)
at org.apache.beam.sdk.Pipeline.traverseTopologically(Pipeline.java:458)
at org.apache.beam.runners.dataflow.DataflowPipelineTranslator$Translator.translate(DataflowPipelineTranslator.java:392)
at org.apache.beam.runners.dataflow.DataflowPipelineTranslator.translate(DataflowPipelineTranslator.java:170)
at org.apache.beam.runners.dataflow.DataflowRunner.run(DataflowRunner.java:680)
at org.apache.beam.runners.dataflow.DataflowRunner.run(DataflowRunner.java:174)
at org.apache.beam.sdk.Pipeline.run(Pipeline.java:311)
at org.apache.beam.sdk.Pipeline.run(Pipeline.java:297)
at org.apache.beam.examples.MyMinimalWordCount.main(MyMinimalWordCount.java:69)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.codehaus.mojo.exec.ExecJavaMojo$1.run(ExecJavaMojo.java:282)
at java.lang.Thread.run(Thread.java:748)
Run Code Online (Sandbox Code Playgroud)
我可以使用以下命令在本地运行该示例:
mvn compile exec:java \
-Pdirect-runner \
-Dexec.mainClass=org.apache.beam.examples.MyMinimalWordCount \
-Dexec.args="--inputFile=pom.xml "
Run Code Online (Sandbox Code Playgroud)
它还可以在 Google Dataflow 上运行:
mvn compile exec:java \
-Pdataflow-runner \
-Dexec.mainClass=org.apache.beam.examples.MyMinimalWordCount \
-Dexec.args="--runner=DataflowRunner \
--project=[project] \
--inputFile=gs://[bucket]/input.csv "
Run Code Online (Sandbox Code Playgroud)
但是当我尝试使用以下内容创建 Google Dataflow 模板时,出现错误:
mvn compile exec:java \
-Pdataflow-runner \
-Dexec.mainClass=org.apache.beam.examples.MyMinimalWordCount \
-Dexec.args="--runner=DataflowRunner \
--project=[project] \
--stagingLocation=gs://[bucket]/staging \
--templateLocation=gs://[bucket]/templates/MyMinimalWordCountTemplate "
Run Code Online (Sandbox Code Playgroud)
另一个令人困惑的事情是 Maven 构建继续并以 BUILD SUCCESS 结束
所以我的问题是:
Q1)我是否应该能够创建这样的 Google Dataflow 模板(使用 ValueProviders 在运行时提供 TextIO 输入)?
Q2)构建过程中的异常是一个真正的错误还是只是一个警告(如日志记录所示)?
Q3)如果 Q1 和 Q2 的答案是肯定的并且“只是警告”,并且我尝试从上传的模板创建作业,为什么它没有任何元数据或不知道我的输入选项?
我用过的参考资料:
| 归档时间: |
|
| 查看次数: |
5018 次 |
| 最近记录: |