我的工作流程有问题spark dataproc。
这在启动时有效:
gcloud dataproc jobs submit spark \
--project myproject \
--cluster=mycluster \
--region=europe-west3 \
--jars=gs:path\file.jar,gs://path//depende.jar \
--class=it.flow \
--properties spark.num.executors=2,spark.executor.cores=3,spark.executor.memory=5g,spark.driver.cores=2,spark.driver.memory=10g,spark.dynamicAllocation.enabled=false,spark.executor.userClassPathFirst=true,spark.driver.userClassPathFirst=true,spark.jars.packages=com.google.cloud:google-cloud-logging:2.2.0
-- 20210820 010000 000 0 000 TRY
Run Code Online (Sandbox Code Playgroud)
我创建了一个 dataproc 工作流程和 python 代码以通过 Composer 启动它并且它可以工作。
现在我必须使最终参数动态化(-- 20210820 010000 000 0 000 TRY)
但是,我无法将参数传递给工作流程:
gcloud dataproc workflow-templates create try1 --region=europe-west3
gcloud dataproc workflow-templates add-job spark \
--workflow-template=try1 \
--step-id=create_try1 \
--class=it.flow \
--region=europe-west3 \
--jars=gs:path\file.jar,gs://path//depende.jar \
--properties spark.num.executors=2,spark.executor.cores=3,spark.executor.memory=5g,spark.driver.cores=2,spark.driver.memory=10g,spark.dynamicAllocation.enabled=false,spark.executor.userClassPathFirst=true,spark.driver.userClassPathFirst=true,spark.jars.packages=com.google.cloud:google-cloud-logging:2.2.0 \
-- $arg1 $arg2
gcloud dataproc workflow-templates set-cluster-selector …Run Code Online (Sandbox Code Playgroud)