我有以下火花工作,试图将一切都记在内存中:
val myOutRDD = myInRDD.flatMap { fp =>
val tuple2List: ListBuffer[(String, myClass)] = ListBuffer()
:
tuple2List
}.persist(StorageLevel.MEMORY_ONLY).reduceByKey { (p1, p2) =>
myMergeFunction(p1,p2)
}.persist(StorageLevel.MEMORY_ONLY)
Run Code Online (Sandbox Code Playgroud)
但是,当我查看作业跟踪器时,我仍然有很多Shuffle Write和Shuffle溢出到磁盘......
Total task time across all tasks: 49.1 h
Input Size / Records: 21.6 GB / 102123058
Shuffle write: 532.9 GB / 182440290
Shuffle spill (memory): 370.7 GB
Shuffle spill (disk): 15.4 GB
Run Code Online (Sandbox Code Playgroud)
然后这个工作失败了因为"no space left on device"......我想知道532.9 GB Shuffle写在这里,是写入磁盘还是内存?
另外,为什么还有15.4 G数据溢出到磁盘,而我特意要求将它们保存在内存中?
谢谢!