错误yarn.ApplicationMaster:未捕获异常:java.util.concurrent.TimeoutException:期货在100000毫秒后超时

BiC*_*hor 7 akka apache-spark apache-spark-sql

我在我的spark应用程序中有这个问题,我使用1.6 spark版本,scala 2.10:

17/10/23 14:32:15 ERROR yarn.ApplicationMaster: Uncaught exception: 
java.util.concurrent.TimeoutException: Futures timed out after [100000
milliseconds]at
scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
at
scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107)
at
scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
at scala.concurrent.Await$.result(package.scala:107) at
org.apache.spark.deploy.yarn.ApplicationMaster.runDriver(ApplicationMaster.scala:342)
at
org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:197)
at
org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$main$1.apply$mcV$sp(ApplicationMaster.scala:680)
at
org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:69)
at
org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:68)
at java.security.AccessController.doPrivileged(Native Method) at
javax.security.auth.Subject.doAs(Subject.java:422) at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1917)
at
org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:68)
at
org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:678)
at
org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala)
17/10/23 14:32:15 INFO yarn.ApplicationMaster: Final app status:
FAILED, exitCode: 10, (reason: Uncaught exception:
java.util.concurrent.TimeoutException: Futures timed out after [100000
milliseconds]) 17/10/23 14:32:15 INFO spark.SparkContext: Invoking
stop() from shutdown hook 17/10/23 14:32:15 INFO ui.SparkUI: Stopped
Spark web UI at http://180.21.232.30:43576 17/10/23 14:32:15 INFO
scheduler.DAGScheduler: ShuffleMapStage 27 (show at Linkage.scala:282)
failed in 24.519 s due to Stage cancelled because SparkContext was
shut down 17/10/23 14:32:15 arkListenerJobEnd (18,1508761935656,JobFailed (org.apache.spark.SparkException:Job 18 cancelled because SparkContext was shut down)) 17/10/23 14:32:15 INFO spark.MapOutputTrackerMasterEndpoint:
MapOutputTrackerMasterEndpoint stopped! 17/10/23 14:32:15 INFO
storage.MemoryStore: MemoryStore cleared 17/10/23 14:32:15 INFO
storage.BlockManager: BlockManager stopped 17/10/23 14:32:15 INFO
storage.BlockManagerMaster: BlockManagerMaster stopped 17/10/23
14:32:15 INFO remote.RemoteActorRefProvider$RemotingTerminator:
Shutting down remote daemon.
17/10/23 14:32:15 INFO util.ShutdownHookManager: Shutdown hook
calledBlockquote
Run Code Online (Sandbox Code Playgroud)

我读了这个问题和我试图修改下一个参数而没有结果的连接--conf spark.yarn.am.waitTime = 6000s

--conf spark.sql.broadcastTimeout = 6000

--conf spark.network.timeout = 600

最好的Regars 

小智 14

请删除代码上的setMaster('local'),因为默认情况下Spark使用EMR中的YARN集群管理器.


ket*_*nkk 7

如果您正在尝试运行您的火花工作yarn client/cluster.不要忘记master从代码中删除配置.master("local[n]").

要提交纱线上的火花工作,你需要通过--master yarn --deploy-mode cluster/client.

已经master 设置为local是给重复超时异常.

  • 我没有将 master 设置为“local[n]”,但我得到了同样的异常。 (4认同)