sam*_*est 6 hadoop scala hdfs apache-spark
我们正在做的是:
我还在底部包含了代码片段和sbt deps.
当我用Google搜索时,似乎有两个模糊的回答:a)节点/用户代码上的火花版本不匹配b)需要向SparkConf添加更多的jar
现在我知道(b)不是在其他集群上成功运行相同代码的问题,而只包含一个jar(它是一个胖罐子).
但我不知道如何检查(a) - 看起来Spark没有任何版本检查或任何东西 - 如果检查版本并抛出"不匹配的版本异常:你有使用版本X和节点Y的版本为Z".
我非常感谢你的建议.我已经提交了一个错误报告,因为Spark文档中存在一些问题,因为我已经看到两个独立的sysadms在不同的集群上获得了与不同版本的CDH完全相同的问题.https://issues.apache.org/jira/browse/SPARK-1867
例外:
Exception in thread "main" org.apache.spark.SparkException: Job aborted: Task 0.0:1 failed 32 times (most recent failure: Exception failure: java.lang.IllegalStateException: unread block data)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1020)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1018)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$abortStage(DAGScheduler.scala:1018)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:604)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:604)
at scala.Option.foreach(Option.scala:236)
at org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:604)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$start$1$$anon$2$$anonfun$receive$1.applyOrElse(DAGScheduler.scala:190)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
at akka.actor.ActorCell.invoke(ActorCell.scala:456)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
at akka.dispatch.Mailbox.run(Mailbox.scala:219)
at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
14/05/16 18:05:31 INFO scheduler.TaskSetManager: Loss was due to java.lang.IllegalStateException: unread block data [duplicate 59]
Run Code Online (Sandbox Code Playgroud)
我的代码片段:
val conf = new SparkConf()
.setMaster(clusterMaster)
.setAppName(appName)
.setSparkHome(sparkHome)
.setJars(SparkContext.jarOfClass(this.getClass))
println("count = " + new SparkContext(conf).textFile(someHdfsPath).count())
Run Code Online (Sandbox Code Playgroud)
我的SBT依赖项:
// relevant
"org.apache.spark" % "spark-core_2.10" % "0.9.1",
"org.apache.hadoop" % "hadoop-client" % "2.3.0-mr1-cdh5.0.0",
// standard, probably unrelated
"com.github.seratch" %% "awscala" % "[0.2,)",
"org.scalacheck" %% "scalacheck" % "1.10.1" % "test",
"org.specs2" %% "specs2" % "1.14" % "test",
"org.scala-lang" % "scala-reflect" % "2.10.3",
"org.scalaz" %% "scalaz-core" % "7.0.5",
"net.minidev" % "json-smart" % "1.2"
Run Code Online (Sandbox Code Playgroud)
改变
"org.apache.hadoop" % "hadoop-client" % "2.3.0-mr1-cdh5.0.0",
Run Code Online (Sandbox Code Playgroud)
到
"org.apache.hadoop" % "hadoop-common" % "2.3.0-cdh5.0.0"
Run Code Online (Sandbox Code Playgroud)
在我的应用程序代码中似乎解决了这个问题。不完全确定原因。我们的集群上有 hadoop-yarn,所以也许“mr1”破坏了东西。
归档时间: |
|
查看次数: |
3059 次 |
最近记录: |