当我在集群中为文件 > 1 个分区运行 MLlib 时,出现以下异常:
2014 年 8 月 16 日 12:43:23 警告 TaskSetManager:阶段 2.1 中丢失任务 2.0(TID 49、da06.qcri.org):FetchFailed(BlockManagerId(3、da08.qcri.org、33322)、shuffleId=0、mapId =5,reduceId=2,消息= org.apache.spark.shuffle.FetchFailedException:无法连接到 org.apache.spark.storage.ShuffleBlockFetcherIterator.ShuffleBlockFetcherIterator.throwFetchFailedException(ShuffleBlockFetcherIterator.scala:323) 处的 da08.qcri.org:33322 org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:300) 在 org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:51) 在 scala.collection.Iterator$$anon$11.next( Iterator.scala:328) 在 scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) 在 scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327) 在 org.apache.spark .util.CompletionIterator.hasNext(CompletionIterator.scala:32) 在 org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39) 在 org.apache.spark.util.collection.ExternalAppendOnlyMap.insertAll(ExternalAppendOnlyMap.scala:152 )在org.apache.spark.Aggregator.combineCombinersByKey(Aggregator.scala:58)在org.apache.spark.shuffle.BlockStoreShuffleReader.read(BlockStoreShuffleReader.scala:83)在org.apache.spark.rdd.ShuffledRDD.compute( ShuffledRDD.scala:98) 在 org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) 在 org.apache.spark.rdd.RDD.iterator(RDD.scala:270) 在 org.apache.spark .rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) 在 org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) 在 org.apache.spark.rdd.RDD.iterator(RDD.scala:270) )在 org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) 在 org.apache.spark.scheduler.Task.run(Task.scala:89) 在 org.apache.spark.executor.Executor$ TaskRunner.run(Executor.scala:227) 在 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 在 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 在 java.lang .Thread.run(Thread.java:745)
引起原因:java.io.IOException:无法连接到 org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:216) 处的 ***.org:33322 处的 org.apache.spark.network。 client.TransportClientFactory.createClient(TransportClientFactory.java:167) 位于 org.apache.spark.network.netty.NettyBlockTransferService$$anon$1.createAndStart(NettyBlockTransferService.scala:90) 位于 org.apache.spark.network.shuffle.RetryingBlockFetcher。 fetchAllOutstanding(RetryingBlockFetcher.java:140) …