当在群集上运行sparkJob超过某个数据大小(~2,5GB)时,我得到"因为SparkContext被关闭而取消了作业"或"执行者丢失".看着纱桂,我看到被杀的工作是成功的.运行500mb的数据时没有问题.我正在寻找一个解决方案,并发现:"似乎纱线杀死了一些执行者,因为他们要求的内存超出预期."
有什么建议怎么调试呢?
命令我提交我的火花作业:
/opt/spark-1.5.0-bin-hadoop2.4/bin/spark-submit --driver-memory 22g --driver-cores 4 --num-executors 15 --executor-memory 6g --executor-cores 6 --class sparkTesting.Runner --master yarn-client myJar.jar jarArguments
Run Code Online (Sandbox Code Playgroud)
和sparkContext设置
val sparkConf = (new SparkConf()
.set("spark.driver.maxResultSize", "21g")
.set("spark.akka.frameSize", "2011")
.set("spark.eventLog.enabled", "true")
.set("spark.eventLog.enabled", "true")
.set("spark.eventLog.dir", configVar.sparkLogDir)
)
Run Code Online (Sandbox Code Playgroud)
失败的简化代码看起来像那样
val hc = new org.apache.spark.sql.hive.HiveContext(sc)
val broadcastParser = sc.broadcast(new Parser())
val featuresRdd = hc.sql("select "+ configVar.columnName + " from " + configVar.Table +" ORDER BY RAND() LIMIT " + configVar.Articles)
val myRdd : org.apache.spark.rdd.RDD[String] = featuresRdd.map(doSomething(_,broadcastParser))
val allWords= featuresRdd
.flatMap(line => line.split(" …Run Code Online (Sandbox Code Playgroud)