我正在尝试在 kubernetes pod 中使用spark-submitwithclient模式向 EMR 提交作业(由于其他一些基础设施问题,我们不允许使用cluster模式)。默认情况下,spark-submit使用hostname该吊舱作为的spark.driver.host和hostname为主机的主机名,以便spark executor解决不了它。并且spark.driver.portpod(容器)也是本地的。
我知道一种将一些 confs 传递给的方法,spark-submit以便spark executor可以与driver,这些配置是:
--conf spark.driver.bindAddress=0.0.0.0 --conf spark.driver.host=$HOST_IP_OF_K8S_WORKER --conf spark.driver.port=32000 --conf spark.driver.blockManager.port=32001
并在 kubernetes 中创建一个服务,以便spark executor可以与driver:
apiVersion: v1
kind: Service
metadata:
name: spark-block-manager
namespace: my-app
spec:
selector:
app: my-app
type: NodePort
ports:
- name: port-0
nodePort: 32000
port: 32000
protocol: TCP
targetPort: 32000
- name: …Run Code Online (Sandbox Code Playgroud) Trying to catch up with the Spark 2.3 documentation on how to deploy jobs on a Kubernetes 1.9.3 cluster : http://spark.apache.org/docs/latest/running-on-kubernetes.html
The Kubernetes 1.9.3 cluster is operating properly on offline bare-metal servers and was installed with kubeadm. The following command was used to submit the job (SparkPi example job):
/opt/spark/bin/spark-submit --master k8s://https://k8s-master:6443 --deploy-mode cluster --name spark-pi --class org.apache.spark.examples.SparkPi --conf spark.executor.instances=2 --conf spark.kubernetes.container.image=spark:v2.3.0 local:///opt/spark/examples/jars/spark-examples_2.11-2.3.0.jar
Run Code Online (Sandbox Code Playgroud)
Here is the stacktrace that we all love:
++ id -u
+ myuid=0
++ id …Run Code Online (Sandbox Code Playgroud)