我一整天都在与它斗争。我能够安装并使用带有 Spark shell 或连接的 Jupiter 笔记本的包(graphframes),但我想使用 Spark-Submit 将其移动到基于 kubernetes 的 Spark 环境。我的spark版本:3.0.1 我从spark-packages下载了最后一个可用的.jar文件(graphframes-0.8.1-spark3.0-s_2.12.jar)并将其放入jars文件夹中。我使用标准 Spark docker 文件的变体来构建我的图像。我的 Spark-submit 命令如下所示:
$SPARK_HOME/bin/spark-submit \
--master k8s://https://kubernetes.docker.internal:6443 \
--deploy-mode cluster \
--conf spark.executor.instances=$2 \
--conf spark.kubernetes.container.image=myimage.io/repositorypath \
--packages graphframes:graphframes:0.8.1-spark3.0-s_2.12 \
--jars "local:///opt/spark/jars/graphframes-0.8.1-spark3.0-s_2.12.jar" \
path/to/my/script/script.py
Run Code Online (Sandbox Code Playgroud)
但它以错误结束:
Ivy Default Cache set to: /opt/spark/.ivy2/cache
The jars for the packages stored in: /opt/spark/.ivy2/jars
:: loading settings :: url = jar:file:/opt/spark/jars/ivy-2.4.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
graphframes#graphframes added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent-e833e157-44f5-4055-81a4-3ab524176ef5;1.0
confs: [default]
Exception in …Run Code Online (Sandbox Code Playgroud)