Spark Parallelize?(找不到名为'id'的创建者属性)

Bre*_*ust 19 serialization apache-spark

调用时,在Apache Spark 1.4.0中导致此序列化错误的原因是:

sc.parallelize(strList, 4)
Run Code Online (Sandbox Code Playgroud)

抛出此异常:

com.fasterxml.jackson.databind.JsonMappingException: 
Could not find creator property with name 'id' (in class org.apache.spark.rdd.RDDOperationScope)
Run Code Online (Sandbox Code Playgroud)

从抛出该异常addBeanProps杰克逊:com.fasterxml.jackson.databind.deser.BeanDeserializerFactory#addBeanProps

RDD是一个Seq [String],而#partitions似乎并不重要(试过1,2,4).

没有序列化堆栈跟踪,正常工作者闭包无法序列化.

跟踪此问题的另一种方法是什么?

小智 41

@Interfector是正确的.我也遇到了这个问题,这是我的sbt文件和'dependencyOverrides'部分的片段,它修复了它.

libraryDependencies ++= Seq(
  "com.amazonaws" % "amazon-kinesis-client" % "1.4.0",
  "org.apache.spark" %% "spark-core" % "1.4.0",
  "org.apache.spark" %% "spark-streaming" % "1.4.0",
  "org.apache.spark" %% "spark-streaming-kinesis-asl" % "1.4.0",
  "com.amazonaws" % "aws-java-sdk" % "1.10.2"
)

dependencyOverrides ++= Set(
  "com.fasterxml.jackson.core" % "jackson-databind" % "2.4.4"
)
Run Code Online (Sandbox Code Playgroud)


Int*_*tor 10

我怀疑这是由类路径引起的,它提供的版本与jacksonSpark期望的版本不同(如果我没有误会,则为2.4.4).您需要调整类路径,以便jackson首先为Spark引用正确的类路径.