相关疑难解决方法(0)

Apache Spark:执行程序之间的网络错误

我在Scala 2.11.2上运行Apache Spark 1.3.1,当在具有足够大数据的HPC集群上运行时,我收到很多错误,比如帖子底部的错误(每秒重复多次,直到工作随着时间的推移被杀死).根据错误,执行程序尝试从其他节点获取随机数据,但无法执行此操作.

同样的程序可以使用(a)较少量的数据,或者(b)在仅本地模式下执行,因此它与通过网络发送的数据有关(并且不会被非常小的触发)数据量).

在发生这种情况时执行的代码如下:

val partitioned_data = data  // data was read as sc.textFile(inputFile)
  .zipWithIndex.map(x => (x._2, x._1))
  .partitionBy(partitioner)  // A custom partitioner
  .map(_._2)

// Force previous lazy operations to be evaluated. Presumably adds some
// overhead, but hopefully the minimum possible...
// Suggested on Spark user list: http://apache-spark-user-list.1001560.n3.nabble.com/Forcing-RDD-computation-with-something-else-than-count-td707.html
sc.runJob(partitioned_data, (iter: Iterator[_]) => {})

Run Code Online (Sandbox Code Playgroud)

这是一个错误的指示,还是有什么我做错了？

这是一个执行程序的stderr日志的小片段(完整日志在这里):

15/04/21 14:59:28 ERROR TransportRequestHandler: Error sending result ChunkFetchSuccess{streamChunkId=StreamChunkId{streamId=1601401593000, chunkIndex=0}, buffer=FileSegmentManagedBuffer{file=/tmp/spark-0f8d0598-b137-4d14-993a-568b2ab3709a/spark-12d5ff0a-2793-4b76-8a0b-d977a5924925/spark-7ad9382d-05cf-49d4-9a52-d42e6ca7117d/blockmgr-b72d4068-d065-47e6-8a10-867f723000db/15/shuffle_0_1_0.data, offset=26501223, length=6227612}} to /10.0.0.5:41160; closing connection
java.io.IOException: Resource temporarily unavailable …

Run Code Online (Sandbox Code Playgroud)

scala apache-spark

mjj*_*son

lucky-day

22
推荐指数

1
解决办法

2万
查看次数

标签统计

apache-spark ×1

scala ×1

Apache Spark:执行程序之间的网络错误

标签 统计

标签统计