Spark上K8s - 收到错误:kube模式不支持在本地引用app依赖项

gar*_*iny 13 apache-spark kubernetes

我想在k8s上设置一个火花簇.通过以下文章我设法创建并设置了具有三个节点的集群:https: //kubernetes.io/docs/setup/independent/create-cluster-kubeadm/

之后,当我尝试在群集上部署spark时,它在spark提交设置时失败了.我使用了这个命令:

~/opt/spark/spark-2.3.0-bin-hadoop2.7/bin/spark-submit \
--master k8s://https://206.189.126.172:6443 \
--deploy-mode cluster \
--name word-count \
--class org.apache.spark.examples.SparkPi \
--conf spark.executor.instances=5 \
--conf spark.kubernetes.container.image=docker.io/garfiny/spark:v2.3.0 \
—-conf spark.kubernetes.driver.pod.name=word-count \
local:///opt/spark/examples/jars/spark-examples_2.11-2.3.0.jar
Run Code Online (Sandbox Code Playgroud)

它给了我这个错误:

Exception in thread "main" org.apache.spark.SparkException: The Kubernetes mode does not yet support referencing application dependencies in the local file system.
    at org.apache.spark.deploy.k8s.submit.DriverConfigOrchestrator.getAllConfigurationSteps(DriverConfigOrchestrator.scala:122)
    at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication$$anonfun$run$5.apply(KubernetesClientApplication.scala:229)
    at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication$$anonfun$run$5.apply(KubernetesClientApplication.scala:227)
    at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2585)
    at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.run(KubernetesClientApplication.scala:227)
    at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.start(KubernetesClientApplication.scala:192)
    at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:879)
    at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:197)
    at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:227)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:136)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Run Code Online (Sandbox Code Playgroud)

2018-06-04 10:58:24 INFO ShutdownHookManager:54 - 关闭钩子叫2018-06-04 10:58:24 INFO ShutdownHookManager:54 - 删除目录/ private/var/folders/lz/0bb8xlyd247cwc3kvh6pmrz00000gn/T/spark- 3967f4ae-e8b3-428d-ba22-580fc9c840cd

注意:我按照这篇文章在k8s上安装spark. https://spark.apache.org/docs/latest/running-on-kubernetes.html

Von*_*onC 6

错误消息来自commit 5d7c4ba4d73a72f26d591108db3c20b4a6c84f3f并包含您提到的页面:" 在Kubernetes上运行Spark ",并提及您指出:

// TODO(SPARK-23153): remove once submission client local dependencies are supported.
if (existSubmissionLocalFiles(sparkJars) || existSubmissionLocalFiles(sparkFiles)) {
  throw new SparkException("The Kubernetes mode does not yet support referencing application " +
    "dependencies in the local file system.")
}
Run Code Online (Sandbox Code Playgroud)

这在SPARK-18278中有描述:

它不会接受运行本地:jar文件,比如local:///opt/spark/examples/jars/spark-examples_2.11-2.2.0-k8s-0.5.0.jar,在我的火花泊坞窗图像(allowsMixedArgumentsisAppResourceReq booleansSparkSubmitCommandBuilder.java得到的方式).

这与kubernetes issue 34377有关

问题SPARK-22962"如果使用本地文件Kubernetes应用程序失败"中提到:

这是资源登台服务器用例.我们将在2.4.0时间框架上游.

与此同时,PR 20320中引入了该错误消息.

它包括评论:

我做的手动测试实际上使用了位于gcs和http上的主app jar.
为了具体和记录,我做了以下测试:

  • 使用gs://主应用程序jar和http://依赖jar.成功了.
  • 使用https://主应用程序jar和http://依赖jar.成功了.
  • 使用local://主应用程序jar.成功了.
  • 使用file://主应用程序jar.失败.
  • 使用file:// dependency jar.失败.

这个问题现在应该已经解决,OP的意思在评论中得到证实:

我使用最新spark-kubernetes jar的替换spark-2.3.0-bin-hadoop2.7包中的那个.例外消失了.