Geo*_*ler 5 scala implicit-conversion apache-spark apache-spark-sql apache-spark-encoders
我在spark 自定义 kryo 编码器中已经概述了spark问题,不提供 UDF 架构,但现在创建了一个最小示例:https : //gist.github.com/geoHeil/dc9cfb8eca5c06fca01fc9fc03431b2f
class SomeOtherClass(foo: Int)
case class FooWithSomeOtherClass(a: Int, b: String, bar: SomeOtherClass)
case class FooWithoutOtherClass(a: Int, b: String, bar: Int)
case class Foo(a: Int)
implicit val someOtherClassEncoder: Encoder[SomeOtherClass] = Encoders.kryo[SomeOtherClass]
val df2 = Seq(FooWithSomeOtherClass(1, "one", new SomeOtherClass(4))).toDS
val df3 = Seq(FooWithoutOtherClass(1, "one", 1), FooWithoutOtherClass(2, "two", 2)).toDS
val df4 = df3.map(d => FooWithSomeOtherClass(d.a, d.b, new SomeOtherClass(d.bar)))
Run Code Online (Sandbox Code Playgroud)
在这里,即使createDataSet
声明失败,因为
java.lang.UnsupportedOperationException: No Encoder found for SomeOtherClass
- field (class: "SomeOtherClass", name: "bar")
- root class: "FooWithSomeOtherClass"
Run Code Online (Sandbox Code Playgroud)
为什么编码器不在范围内或至少不在正确的范围内?
此外,尝试指定一个显式编码器,如:
df3.map(d => {FooWithSomeOtherClass(d.a, d.b, new SomeOtherClass(d.bar))}, (Int, String, Encoders.kryo[SomeOtherClass]))
Run Code Online (Sandbox Code Playgroud)
不起作用。
发生这种情况是因为您应该在整个序列化堆栈中使用 Kryo 编码器,这意味着您的顶级对象应该有一个 Kryo 编码器。以下内容在本地 Spark shell 上成功运行(您感兴趣的更改位于第一行):
implicit val topLevelObjectEncoder: Encoder[FooWithSomeOtherClass] = Encoders.kryo[FooWithSomeOtherClass]
val df1 = Seq(Foo(1), Foo(2)).toDF
val df2 = Seq(FooWithSomeOtherClass(1, "one", new SomeOtherClass(4))).toDS
val df3 = Seq(FooWithoutOtherClass(1, "one", 1), FooWithoutOtherClass(2, "two", 2)).toDS
df3.printSchema
df3.show
val df4 = df3.map(d => FooWithSomeOtherClass(d.a, d.b, new SomeOtherClass(d.bar)))
df4.printSchema
df4.show
df4.collect
Run Code Online (Sandbox Code Playgroud)
归档时间: |
|
查看次数: |
4715 次 |
最近记录: |