我在Scala 2.11.2上运行Apache Spark 1.3.1,当在具有足够大数据的HPC集群上运行时,我收到很多错误,比如帖子底部的错误(每秒重复多次,直到工作随着时间的推移被杀死).根据错误,执行程序尝试从其他节点获取随机数据,但无法执行此操作.
同样的程序可以使用(a)较少量的数据,或者(b)在仅本地模式下执行,因此它与通过网络发送的数据有关(并且不会被非常小的触发)数据量).
在发生这种情况时执行的代码如下:
val partitioned_data = data // data was read as sc.textFile(inputFile)
.zipWithIndex.map(x => (x._2, x._1))
.partitionBy(partitioner) // A custom partitioner
.map(_._2)
// Force previous lazy operations to be evaluated. Presumably adds some
// overhead, but hopefully the minimum possible...
// Suggested on Spark user list: http://apache-spark-user-list.1001560.n3.nabble.com/Forcing-RDD-computation-with-something-else-than-count-td707.html
sc.runJob(partitioned_data, (iter: Iterator[_]) => {})
Run Code Online (Sandbox Code Playgroud)
这是一个错误的指示,还是有什么我做错了?
这是一个执行程序的stderr日志的小片段(完整日志在这里):
15/04/21 14:59:28 ERROR TransportRequestHandler: Error sending result ChunkFetchSuccess{streamChunkId=StreamChunkId{streamId=1601401593000, chunkIndex=0}, buffer=FileSegmentManagedBuffer{file=/tmp/spark-0f8d0598-b137-4d14-993a-568b2ab3709a/spark-12d5ff0a-2793-4b76-8a0b-d977a5924925/spark-7ad9382d-05cf-49d4-9a52-d42e6ca7117d/blockmgr-b72d4068-d065-47e6-8a10-867f723000db/15/shuffle_0_1_0.data, offset=26501223, length=6227612}} to /10.0.0.5:41160; closing connection
java.io.IOException: Resource temporarily unavailable …Run Code Online (Sandbox Code Playgroud)