use*_*071 8 json scala jackson apache-spark
我试图用Spark 1.1.0和Jackson 2.4.4运行spark-submit.我有scala代码,它使用Jackson将JSON反序列化为case类.这本身就可以正常工作,但是当我使用它时,我得到以下错误:
15/05/01 17:50:11 ERROR Executor: Exception in task 0.0 in stage 1.0 (TID 2)
java.lang.NoSuchMethodError: com.fasterxml.jackson.databind.introspect.POJOPropertyBuilder.addField(Lcom/fasterxml/jackson/databind/introspect/AnnotatedField;Lcom/fasterxml/jackson/databind/PropertyName;ZZZ)V
at com.fasterxml.jackson.module.scala.introspect.ScalaPropertiesCollector.com$fasterxml$jackson$module$scala$introspect$ScalaPropertiesCollector$$_addField(ScalaPropertiesCollector.scala:109)
at com.fasterxml.jackson.module.scala.introspect.ScalaPropertiesCollector$$anonfun$_addFields$2$$anonfun$apply$11.apply(ScalaPropertiesCollector.scala:100)
at com.fasterxml.jackson.module.scala.introspect.ScalaPropertiesCollector$$anonfun$_addFields$2$$anonfun$apply$11.apply(ScalaPropertiesCollector.scala:99)
at scala.Option.foreach(Option.scala:236)
at com.fasterxml.jackson.module.scala.introspect.ScalaPropertiesCollector$$anonfun$_addFields$2.apply(ScalaPropertiesCollector.scala:99)
at com.fasterxml.jackson.module.scala.introspect.ScalaPropertiesCollector$$anonfun$_addFields$2.apply(ScalaPropertiesCollector.scala:93)
at scala.collection.GenTraversableViewLike$Filtered$$anonfun$foreach$4.apply(GenTraversableViewLike.scala:109)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
at scala.collection.SeqLike$$anon$2.foreach(SeqLike.scala:635)
at scala.collection.GenTraversableViewLike$Filtered$class.foreach(GenTraversableViewLike.scala:108)
at scala.collection.SeqViewLike$$anon$5.foreach(SeqViewLike.scala:80)
at com.fasterxml.jackson.module.scala.introspect.ScalaPropertiesCollector._addFields(ScalaPropertiesCollector.scala:93)
Run Code Online (Sandbox Code Playgroud)
这是我的build.sbt:
//scalaVersion in ThisBuild := "2.11.4"
scalaVersion in ThisBuild := "2.10.5"
retrieveManaged := true
libraryDependencies += "org.scala-lang" % "scala-reflect" % scalaVersion.value
libraryDependencies ++= Seq(
"junit" % "junit" % "4.12" % "test",
"org.scalatest" %% "scalatest" % "2.2.4" % "test",
"org.mockito" % "mockito-core" % "1.9.5",
"org.specs2" %% "specs2" % "2.1.1" % "test",
"org.scalatest" %% "scalatest" % "2.2.4" % "test"
)
libraryDependencies ++= Seq(
"org.apache.hadoop" % "hadoop-core" % "0.20.2",
"org.apache.hbase" % "hbase" % "0.94.6"
)
//libraryDependencies += "org.apache.spark" % "spark-core_2.10" % "1.3.0"
libraryDependencies += "org.apache.spark" % "spark-core_2.10" % "1.1.0"
libraryDependencies += "com.fasterxml.jackson.module" %% "jackson-module-scala" % "2.4.4"
//libraryDependencies += "com.fasterxml.jackson.module" %% "jackson-module-scala" % "2.3.1"
//libraryDependencies += "com.fasterxml.jackson.module" %% "jackson-module-scala" % "2.5.0"
libraryDependencies += "com.typesafe" % "config" % "1.2.1"
resolvers += Resolver.mavenLocal
Run Code Online (Sandbox Code Playgroud)
如你所见,我尝试了许多不同版本的杰克逊.
这是我用来运行spark submit的shell脚本:
#!/bin/bash
sbt package
CLASS=com.org.test.spark.test.SparkTest
SPARKDIR=/Users/user/Desktop/
#SPARKVERSION=1.3.0
SPARKVERSION=1.1.0
SPARK="$SPARKDIR/spark-$SPARKVERSION/bin/spark-submit"
jar_jackson=/Users/user/scala_projects/lib_managed/bundles/com.fasterxml.jackson.module/jackson-module-scala_2.10/jackson-module-scala_2.10-2.4.4.jar
"$SPARK" \
--class "$CLASS" \
--jars $jar_jackson \
--master local[4] \
/Users/user/scala_projects/target/scala-2.10/spark_project_2.10-0.1-SNAPSHOT.jar \
print /Users/user/test.json
Run Code Online (Sandbox Code Playgroud)
我使用--jars
jackson jar的路径到spark-submit命令.我甚至尝试过不同版本的Spark.我甚至还指定了Jackson jars数据绑定,注释等的路径,但这并没有解决问题.任何帮助,将不胜感激.谢谢
小智 6
我有同样的问题,我的play-json jar使用jackson 2.3.2并且spark使用的是jackson 2.4.4.
当我运行spark应用程序时,它无法在jackson-2.3.2中找到该方法,我得到了相同的异常.
我检查了杰克逊的maven依赖层次结构.它显示了它所采用的版本和哪个jar(这里播放使用2.3.2)和我的play-json首先放在依赖列表中,它花了2.3.2版本.
所以我尝试将play依赖项放在所有依赖项的末尾/在spark依赖之后,它运行得很好.这次花了2.4.4,省略了版本2.3.2.
来源:
请注意,如果两个依赖关系版本在依赖关系树中处于相同的深度,则直到Maven 2.0.8没有定义哪一个会赢,但是自Maven 2.0.9开始,它就是声明中的顺序:第一个声明获胜.
我认为主要原因是您没有指定正确的依赖项。
如果您使用第 3 方库然后submit to Spark
直接使用,更好的方法是使用sbt-assembly
(https://github.com/sbt/sbt-assemble)。