我知道Apache Spark是围绕弹性数据结构设计的,但是在运行的系统中是预期的故障还是这通常表明存在问题?
当我开始了扩展系统以不同的配置,我看到ExecutorLostFailure和No more replicas(见下文).系统恢复并且程序结束.
我是否应该关注这一点,我们通常可以采取哪些措施来避免这种情况; 或者这是因为遗嘱执行人数增加了吗?
18/05/18 23:59:00 WARN TaskSetManager: Lost task 87.0 in stage 4044.0 (TID 391338, ip-10-0-0-68.eu-west-1.compute.internal, executor 11): ExecutorLostFailure (executor 11 exited caused by one of the running tasks) Reason: Container marked as failed: container_1526667532988_0010_01_000012 on host: ip-10-0-0-68.eu-west-1.compute.internal. Exit status: -100. Diagnostics: Container released on a *lost* node
18/05/18 23:59:00 WARN BlockManagerMasterEndpoint: No more replicas available for rdd_193_7 !
18/05/18 23:59:00 WARN BlockManagerMasterEndpoint: No more replicas available for rdd_582_50 !
18/05/18 23:59:00 …Run Code Online (Sandbox Code Playgroud) 我正在使用 spark 2.4.1 版本和 java8 将数据复制到 cassandra-3.0。
我的火花作业脚本是
$SPARK_HOME/bin/spark-submit \
--master yarn \
--deploy-mode cluster \
--name MyDriver \
--jars "/local/jars/*.jar" \
--files hdfs://files/application-cloud-dev.properties,hdfs://files/column_family_condition.yml \
--class com.sp.MyDriver \
--executor-cores 3 \
--executor-memory 9g \
--num-executors 5 \
--driver-cores 2 \
--driver-memory 4g \
--driver-java-options -Dconfig.file=./application-cloud-dev.properties \
--conf spark.executor.extraJavaOptions=-Dconfig.file=./application-cloud-dev.properties \
--conf spark.driver.extraClassPath=. \
--driver-class-path . \
ca-datamigration-0.0.1.jar application-cloud-dev.properties
Run Code Online (Sandbox Code Playgroud)
以为工作成功了,我的日志文件中充满了以下 WARN。
WARN org.apache.spark.storage.BlockManagerMasterEndpoint - No more replicas available for rdd_558_5026 !
2019-09-20 00:02:37,882 [dispatcher-event-loop-1] WARN org.apache.spark.storage.BlockManagerMasterEndpoint - No more replicas available for …Run Code Online (Sandbox Code Playgroud) datastax-java-driver apache-spark apache-spark-sql cassandra-3.0