Spark Streaming-同一工作线程上有多个执行程序时，阻止复制策略问题

Dav*_*ini 5 executor apache-spark spark-streaming apache-spark-standalone

我在由三个节点组成的集群上运行一个火花流应用程序，每个节点都有一个工作程序和三个执行程序（因此总共有9个执行程序）。我正在使用Spark版本2.3.2和Spark独立群集管理器。

问题

调查工作计算机完全停机时的最近一个问题，我可以看到由于以下原因，火花流作业已停止：

18/10/08 11:53:03 ERROR TaskSetManager: Task 122 in stage 413804.1 failed 8 times; aborting job

Run Code Online (Sandbox Code Playgroud)

由于同一阶段中的一项任务失败了8次，因此该作业被中止。这是预期的行为。

提到的任务失败，原因如下：

18/10/08 11:53:03 INFO DAGScheduler: ShuffleMapStage 413804 (flatMapToPair at MessageReducer.java:30) failed in 3.817 s due to Job aborted due to stage failure: Task 122 in stage 413804.1 failed 8 times, most recent failure: Lost task 122.7 in stage 413804.1 (TID 223071001, 10.12.101.60, executor 1): java.lang.Exception: Could not compute split, block input-39-1539013586600 of RDD 1793044 not found
org.apache.spark.SparkException: Job aborted due to stage failure: Task 122 in stage 413804.1 failed 8 times, most recent failure: Lost task 122.7 in stage 413804.1 (TID 223071001, 10.12.101.60, executor 1): java.lang.Exception: Could not compute split, block input-39-1539013586600 of RDD 1793044 not found

Run Code Online (Sandbox Code Playgroud)

因此，然后我尝试跟踪未找到的块输入39-1539013586600，可以看到以下内容：

18/10/08 11:46:26 INFO BlockManagerInfo: Added input-39-1539013586600 in memory on 10.10.101.66:32825 (size: 1398.0 B, free: 5.2 GB)
18/10/08 11:46:26 INFO BlockManagerInfo: Added input-39-1539013586600 in memory on 10.10.101.66:35258 (size: 1398.0 B, free: 5.2 GB)
18/10/08 11:47:35 WARN BlockManagerMasterEndpoint: No more replicas available for input-39-1539013586600 !
18/10/08 11:53:03 WARN TaskSetManager: Lost task 122.0 in stage 413804.1 (TID 223070944, 10.10.101.60, executor 5): java.lang.Exception: Could not compute split, block input-39-1539013586600 of RDD 1793044 not found
18/10/08 11:53:03 INFO TaskSetManager: Lost task 122.1 in stage 413804.1 (TID 223070956) on 10.12.101.66, executor 9: java.lang.Exception (Could not compute split, block input-39-1539013586600 of RDD 1793044 not found) [duplicate 1]

Run Code Online (Sandbox Code Playgroud)

As you can notice, the block was replicated on two different executors on the same worker (10.10.101.66 in this case).

Spark code

We then checked spark code to see if this behaviour is normal, and it seems it is. The default policy used in the BlockManager is RandomBlockReplication (https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/storage/BlockManager.scala#L240).

In this policy, despite the javaDoc saying "...basic implementation, that just makes sure we put blocks on different hosts, if possible", the policy seems completely random, as they are not using the host property of the BlockManagerId object to try to put the replica on a different host (https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/storage/BlockReplicationPolicy.scala#L120).

If our analysis is correct, it seems that in a configuration like ours (multiple executors in one worker machine) spark stream can go easily down if the whole host is lost.

Forcing the job to use BasicBlockReplicationPolicy does not seem a solution either, as this policy fallback to the random mechanism if no topology is specified (https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/storage/BlockReplicationPolicy.scala#L169), and I was not able to find in the code where we can set the topology (this value it seems not to be used for the moment).

Final questions

Can we consider this as a bug in Spark?
Has anyone faced this issue in the past? Is there any workaround available (other than reduce the number of executor per worker to 1)?

归档时间：	7 年，4 月前
查看次数：	343 次
最近记录：	7 年，4 月前