如何在 Docker 容器内设置执行器 IP?

M15*_*156 2 java docker apache-spark

在过去的 3 天里,我尝试使用 3 个组件设置 Docker 机器:Spark Master、Spark Worker 和 Driver (Java) 应用程序

从 docker 外部启动驱动程序时,一切正常。然而,启动所有三个组件会导致端口防火墙主机噩梦

为了保持它(起初)简单,我使用 docker-compose - 这是我的 docker-compose.yml:

driver:
  hostname: driver
  image: driverimage
  command: -Dexec.args="0 192.168.99.100" -Dspark.driver.port=7001 -Dspark.driver.host=driver -Dspark.executor.port=7006 -Dspark.broadcast.port=15001 -Dspark.fileserver.port=15002 -Dspark.blockManager.port=15003 -Dspark.broadcast.factory=org.apache.spark.broadcast.HttpBroadcastFactory
  ports:
    - 10200:10200 # Module REST Port
    - 4040:4040 # Web UI (Spark)
    - 7001:7001 # Driver Port (Spark)
    - 15001:15001 # Broadcast (Spark)
    - 15002:15002 # File Server (Spark)
    - 15003:15003 # Blockmanager (Spark)
    - 7337:7337 # Shuffle? (Spark)
  extra_hosts:
    - sparkmaster:192.168.99.100
    - sparkworker:192.168.99.100
  environment:
    SPARK_LOCAL_IP: 192.168.99.100
    #SPARK_MASTER_OPTS: "-Dspark.driver.port=7001 -Dspark.fileserver.port=7002 -Dspark.broadcast.port=7003 -Dspark.replClassServer.port=7004 -Dspark.blockManager.port=7005 -Dspark.executor.port=7006 -Dspark.ui.port=4040 -Dspark.broadcast.factory=org.apache.spark.broadcast.HttpBroadcastFactory"
    #SPARK_WORKER_OPTS: "-Dspark.driver.port=7001 -Dspark.fileserver.port=7002 -Dspark.broadcast.port=7003 -Dspark.replClassServer.port=7004 -Dspark.blockManager.port=7005 -Dspark.executor.port=7006 -Dspark.ui.port=4040 -Dspark.broadcast.factory=org.apache.spark.broadcast.HttpBroadcastFactory"
    SPARK_JAVA_OPTS: "-Dspark.driver.port=7001 -Dspark.fileserver.port=7002 -Dspark.broadcast.port=15001 -Dspark.replClassServer.port=7004 -Dspark.blockManager.port=7005 -Dspark.executor.port=7006 -Dspark.ui.port=4040 -Dspark.broadcast.factory=org.apache.spark.broadcast.HttpBroadcastFactory"

sparkmaster:
  extra_hosts:
    - driver:192.168.99.100
  image: gettyimages/spark
  command: /usr/spark/bin/spark-class org.apache.spark.deploy.master.Master -h sparkmaster
  hostname: sparkmaster
  environment:
    SPARK_CONF_DIR: /conf
    MASTER: spark://sparkmaster:7077
    SPARK_LOCAL_IP: 192.168.99.100
    SPARK_JAVA_OPTS:  "-Dspark.broadcast.factory=org.apache.spark.broadcast.HttpBroadcastFactory"
    SPARK_WORKER_OPTS: "-Dspark.broadcast.factory=org.apache.spark.broadcast.HttpBroadcastFactory"
    SPARK_MASTER_OPTS: "-Dspark.driver.port=7001 -Dspark.fileserver.port=7002 -Dspark.broadcast.port=7003 -Dspark.replClassServer.port=7004 -Dspark.executor.port=7006 -Dspark.ui.port=4040 -Dspark.broadcast.factory=org.apache.spark.broadcast.HttpBroadcastFactory"
    #SPARK_WORKER_OPTS: "-Dspark.driver.port=7001 -Dspark.fileserver.port=7002 -Dspark.broadcast.port=7003 -Dspark.replClassServer.port=7004 -Dspark.blockManager.port=7005 -Dspark.executor.port=7006 -Dspark.ui.port=4040 -Dspark.broadcast.factory=org.apache.spark.broadcast.HttpBroadcastFactory"
    #SPARK_JAVA_OPTS: "-Dspark.driver.port=7001 -Dspark.fileserver.port=7002 -Dspark.broadcast.port=7003 -Dspark.replClassServer.port=7004 -Dspark.blockManager.port=7005 -Dspark.executor.port=7006 -Dspark.ui.port=4040 -Dspark.broadcast.factory=org.apache.spark.broadcast.HttpBroadcastFactory"
  expose:
    - 7001
    - 7002
    - 7003
    - 7004
    - 7005
    - 7006
    - 7077
    - 6066
  ports:
    - 6066:6066
    - 7077:7077 # Master (Main Port)
    - 8080:8080 # Web UI
    #- 7006:7006 # Executor

sparkworker:
  extra_hosts:
    - driver:192.168.99.100
  image: gettyimages/spark
  command: /usr/spark/bin/spark-class org.apache.spark.deploy.worker.Worker -h sparkworker spark://sparkmaster:7077
#  volumes:
#    - ./spark/logs:/log/spark
  hostname: sparkworker
  environment:
    SPARK_CONF_DIR: /conf
    SPARK_WORKER_CORES: 4
    SPARK_WORKER_MEMORY: 4g
    SPARK_WORKER_PORT: 8881
    SPARK_WORKER_WEBUI_PORT: 8081
    SPARK_LOCAL_IP: 192.168.99.100
    #SPARK_MASTER_OPTS: "-Dspark.driver.port=7001 -Dspark.fileserver.port=7002 -Dspark.broadcast.port=7003 -Dspark.replClassServer.port=7004 -Dspark.blockManager.port=7005 -Dspark.executor.port=7006 -Dspark.ui.port=4040 -Dspark.broadcast.factory=org.apache.spark.broadcast.HttpBroadcastFactory"
    SPARK_JAVA_OPTS:  "-Dspark.broadcast.factory=org.apache.spark.broadcast.HttpBroadcastFactory"
    SPARK_MASTER_OPTS: "-Dspark.broadcast.factory=org.apache.spark.broadcast.HttpBroadcastFactory"
    SPARK_WORKER_OPTS: "-Dspark.driver.port=7001 -Dspark.fileserver.port=7002 -Dspark.broadcast.port=7003 -Dspark.replClassServer.port=7004 -Dspark.blockManager.port=15003 -Dspark.executor.port=7006 -Dspark.ui.port=4040 -Dspark.broadcast.factory=org.apache.spark.broadcast.HttpBroadcastFactory"
    #SPARK_JAVA_OPTS: "-Dspark.driver.port=7001 -Dspark.fileserver.port=7002 -Dspark.broadcast.port=7003 -Dspark.replClassServer.port=7004 -Dspark.blockManager.port=7005 -Dspark.executor.port=7006 -Dspark.ui.port=4040 -Dspark.broadcast.factory=org.apache.spark.broadcast.HttpBroadcastFactory"
  links:
    - sparkmaster
  expose:
    - 7001
    - 7002
    - 7003
    - 7004
    - 7005
    - 7006
    - 7012
    - 7013
    - 7014
    - 7015
    - 7016
    - 8881
  ports:
    - 8081:8081 # WebUI
    #- 15003:15003 # Blockmanager+
    - 7005:7005 # Executor
    - 7006:7006 # Executor
    #- 7006:7006 # Executor
Run Code Online (Sandbox Code Playgroud)

我什至不知道实际使用了哪个端口等等。我知道我当前的问题如下。Driver可以和Master通信,Master可以和Worker通信,我认为Driver可以和Worker通信!!!驱动程序无法与 / 执行程序通信。我也发现了问题。当我打开应用程序 UI 并打开 exectuors 选项卡时,它会显示“Executor 0 - Address 172.17.0.1:7005”。

所以问题是,驱动程序使用 Docker 网关地址寻址执行器,这是行不通的。我尝试了几件事(SPARK_LOCAL_IP,使用显式主机名等),但驱动程序总是尝试与 Docker 网关通信......任何想法如何实现驱动程序可以与执行程序/工作者通信?

小智 5

这是由于 Spark 提供的配置选项不足。Spark 绑定到侦听SPARK_LOCAL_HOSTNAME并将这个确切的主机名传播到集群。不幸的是,如果驱动程序在 NAT 之后,例如 Docker 容器,则此设置不起作用。

您可以通过以下设置解决此问题(我已成功使用此 hack):

  • 以 1 对 1 方式转发所有必要的端口(如您所做的那样)
  • 为驱动程序使用自定义主机名:设置例如 SPARK_LOCAL_HOSTNAME: mydriver
  • 对于主节点和工作节点,添加192.168.99.100 mydriver/etc/hosts,以便它们可以访问 Spark 驱动程序。
  • 对于 docker 容器,映射mydriver0.0.0.0. 这将使 Spark 驱动程序绑定到0.0.0.0,因此 master 和 worker 可以访问它:

要在 docker-compose.yml 中做到这一点,只需添加以下几行:

 extra_hosts:
  - "mydriver:0.0.0.0"
Run Code Online (Sandbox Code Playgroud)