独立集群模式下具有Apache Spark的Docker容器

Question

独立集群模式下具有Apache Spark的Docker容器

Tw *_*Nus 5 docker apache-spark dockerfile

我正在尝试构建包含Apache Spark的Docker映像。IT建立在openjdk-8-jre官方映像上。

目标是在集群模式下执行Spark，因此至少有一个主控（通过启动sbin/start-master.sh）和一个或多个从属（sbin/start-slave.sh）。有关我的Dockerfile和入口点脚本，请参见spark-standalone- docker。

构建本身实际上会经历，问题是当我想运行容器时，它会在之后不久启动和停止。原因是Spark主服务器启动脚本以守护程序模式启动主服务器并退出。这样容器就终止了，因为前台不再运行任何进程。

显而易见的解决方案是在前台运行Spark master进程，但是我不知道怎么做（Google也没有打开任何东西）。我的“解决方法”是tails -f在Spark日志目录上运行。

因此，我的问题是：

如何在前台运行Apache Spark Master？
如果第一个不可能/不可行/不可行，那么使容器保持“活动”状态的首选（即最佳实践）解决方案是什么（我真的不想使用无限循环和sleep命令）？

Answer 1

zer*_*323 6

如何在前台运行 Apache Spark Master？

您可以spark-class与Master.

bin/spark-class org.apache.spark.deploy.master.Master

Run Code Online (Sandbox Code Playgroud)

对于工人来说也是一样：

bin/spark-class org.apache.spark.deploy.worker.Worker $MASTER_URL

Run Code Online (Sandbox Code Playgroud)

如果您正在寻找生产就绪的解决方案，您应该考虑使用合适的主管，如dumb-init或tini。

Answer 2

dsn*_*ode 6

更新的答案（对于火花 2.4.0）：

要在前台启动 spark master，只需在运行 ./start-master.sh 之前在您的环境中设置 ENV 变量 SPARK_NO_DAEMONIZE=true

你很高兴去。

有关更多信息，请查看 $SPARK_HOME/sbin/spark-daemon.sh

# Runs a Spark command as a daemon.
#
# Environment Variables
#
#   SPARK_CONF_DIR  Alternate conf dir. Default is ${SPARK_HOME}/conf.
#   SPARK_LOG_DIR   Where log files are stored. ${SPARK_HOME}/logs by default.
#   SPARK_MASTER    host:path where spark code should be rsync'd from
#   SPARK_PID_DIR   The pid files are stored. /tmp by default.
#   SPARK_IDENT_STRING   A string representing this instance of spark. $USER by default
#   SPARK_NICENESS The scheduling priority for daemons. Defaults to 0.
#   SPARK_NO_DAEMONIZE   If set, will run the proposed command in the foreground. It will not output a PID file.
##

Run Code Online (Sandbox Code Playgroud)

归档时间：	9 年，3 月前
查看次数：	1830 次
最近记录：	7 年前