为什么我的 spark executors 不断启动和退出?

joe*_*joe 1 python apache-spark pyspark

我正在尝试在我的独立 Spark 集群上运行一个简单的 Python 脚本。集群有一个节点在运行bin/start-master.sh,两个节点在运行bin/start-slave.sh。查看主节点上的 Spark UI,我可以看到主节点可以看到工作节点。这是我的小 Python 测试脚本:

from pyspark import SparkContext

def add_three(num: int):
    return num + 3

print("Initializing spark context....")
sc = SparkContext(appName="test.py")

arr = [x for x in range(1000)]
print(f'Initial array: {arr}')
res = (sc.parallelize(arr)
         .map(lambda x: add_three(x))
         .collect())
print(f'Transformed array: {res}')
sc.stop()
Run Code Online (Sandbox Code Playgroud)

我使用以下命令在单独的节点上运行它:

bin/spark-submit --master spark://spark-master:7077 test.py

这开始了,我可以在我的主人的 UI 中看到该应用程序。在输出中,打印了初始数组,但随后有连续不断的工人退出和启动。以下是大师的日志:

2018-08-31 21:23:12 INFO  Master:54 - I have been elected leader! New state: ALIVE
2018-08-31 21:23:18 INFO  Master:54 - Registering worker 10.1.2.93:38905 with 1 cores, 1024.0 MB RAM
2018-08-31 21:23:20 INFO  Master:54 - Registering worker 10.1.1.107:36421 with 1 cores, 1024.0 MB RAM
2018-08-31 21:25:51 INFO  Master:54 - Registering app test.py
2018-08-31 21:25:51 INFO  Master:54 - Registered app test.py with ID app-20180831212551-0000
2018-08-31 21:25:52 INFO  Master:54 - Launching executor app-20180831212551-0000/0 on worker worker-20180831212319-10.1.1.107-36421
2018-08-31 21:25:52 INFO  Master:54 - Launching executor app-20180831212551-0000/1 on worker worker-20180831212318-10.1.2.93-38905
2018-08-31 21:25:53 INFO  Master:54 - Removing executor app-20180831212551-0000/0 because it is EXITED
2018-08-31 21:25:53 INFO  Master:54 - Launching executor app-20180831212551-0000/2 on worker worker-20180831212319-10.1.1.107-36421
2018-08-31 21:25:55 INFO  Master:54 - Removing executor app-20180831212551-0000/2 because it is EXITED
2018-08-31 21:25:55 INFO  Master:54 - Launching executor app-20180831212551-0000/3 on worker worker-20180831212319-10.1.1.107-36421
2018-08-31 21:25:55 INFO  Master:54 - Removing executor app-20180831212551-0000/1 because it is EXITED
2018-08-31 21:25:55 INFO  Master:54 - Launching executor app-20180831212551-0000/4 on worker worker-20180831212318-10.1.2.93-38905
2018-08-31 21:25:56 INFO  Master:54 - Removing executor app-20180831212551-0000/3 because it is EXITED
2018-08-31 21:25:56 INFO  Master:54 - Launching executor app-20180831212551-0000/5 on worker worker-20180831212319-10.1.1.107-36421
Run Code Online (Sandbox Code Playgroud)

我知道当我在 pyspark 脚本中使用SparkContext("local", "test.py"). 驱动程序日志或执行程序日志似乎都没有任何错误,所以我没有关于出了什么问题的线索,但它们都不断滚动启动和删除执行程序。

任何见解将不胜感激!谢谢!

joe*_*joe 6

结果发现是网络问题。我在单独的 docker 容器中运行我的 spark 工作线程、master 和驱动程序,并且需要公开它们之间的端口。特别是,对于端口spark.driver.portspark.ui.portspark.blockManager.port。我能够通过遵循 dockerfile 并在此 repo 中运行脚本来使工作正常进行:https : //github.com/tashoyan/docker-spark-submit

谢谢!