我在amazom aws emr 4.0.0中运行spark 1.4.1
对于一些共振火花saveAsTextFile在emr 4.0.0上与emr 3.8相比非常慢(为5秒,现在为95秒)
实际上saveAsTextFile表示它已经在4.356秒完成,但之后我看到很多INFO消息,com.amazonaws.latency记录器在接下来的90秒内出现404错误
spark> sc.parallelize(List.range(0, 1600000),160).map(x => x + "\t" + "A"*100).saveAsTextFile("s3n://foo-bar/tmp/test40_20")
2015-09-01 21:16:17,637 INFO [dag-scheduler-event-loop] scheduler.DAGScheduler (Logging.scala:logInfo(59)) - ResultStage 5 (saveAsTextFile at <console>:22) finished in 4.356 s
2015-09-01 21:16:17,637 INFO [task-result-getter-2] cluster.YarnScheduler (Logging.scala:logInfo(59)) - Removed TaskSet 5.0, whose tasks have all completed, from pool
2015-09-01 21:16:17,637 INFO [main] scheduler.DAGScheduler (Logging.scala:logInfo(59)) - Job 5 finished: saveAsTextFile at <console>:22, took 4.547829 s
2015-09-01 21:16:17,638 INFO [main] s3n.S3NativeFileSystem (S3NativeFileSystem.java:listStatus(896)) - listStatus s3n://foo-bar/tmp/test40_20/_temporary/0 with recursive false
2015-09-01 …Run Code Online (Sandbox Code Playgroud)