在范围内找不到火花隐式编码器

Question

在范围内找不到火花隐式编码器

Geo*_*ler 5 scala implicit-conversion apache-spark apache-spark-sql apache-spark-encoders

我在spark 自定义 kryo 编码器中已经概述了spark问题，不提供 UDF 架构，但现在创建了一个最小示例：https : //gist.github.com/geoHeil/dc9cfb8eca5c06fca01fc9fc03431b2f

class SomeOtherClass(foo: Int)
case class FooWithSomeOtherClass(a: Int, b: String, bar: SomeOtherClass)
case class FooWithoutOtherClass(a: Int, b: String, bar: Int)
case class Foo(a: Int)
implicit val someOtherClassEncoder: Encoder[SomeOtherClass] = Encoders.kryo[SomeOtherClass]
val df2 = Seq(FooWithSomeOtherClass(1, "one", new SomeOtherClass(4))).toDS
val df3 = Seq(FooWithoutOtherClass(1, "one", 1), FooWithoutOtherClass(2, "two", 2)).toDS
val df4 = df3.map(d => FooWithSomeOtherClass(d.a, d.b, new SomeOtherClass(d.bar)))

Run Code Online (Sandbox Code Playgroud)

在这里，即使createDataSet声明失败，因为

java.lang.UnsupportedOperationException: No Encoder found for SomeOtherClass
- field (class: "SomeOtherClass", name: "bar")
- root class: "FooWithSomeOtherClass"

Run Code Online (Sandbox Code Playgroud)

为什么编码器不在范围内或至少不在正确的范围内？

此外，尝试指定一个显式编码器，如：

df3.map(d => {FooWithSomeOtherClass(d.a, d.b, new SomeOtherClass(d.bar))}, (Int, String, Encoders.kryo[SomeOtherClass]))

Run Code Online (Sandbox Code Playgroud)

不起作用。

Answer 1

ste*_*ino 5

发生这种情况是因为您应该在整个序列化堆栈中使用 Kryo 编码器，这意味着您的顶级对象应该有一个 Kryo 编码器。以下内容在本地 Spark shell 上成功运行（您感兴趣的更改位于第一行）：

  implicit val topLevelObjectEncoder: Encoder[FooWithSomeOtherClass] = Encoders.kryo[FooWithSomeOtherClass]

  val df1 = Seq(Foo(1), Foo(2)).toDF

  val df2 = Seq(FooWithSomeOtherClass(1, "one", new SomeOtherClass(4))).toDS

  val df3 = Seq(FooWithoutOtherClass(1, "one", 1), FooWithoutOtherClass(2, "two", 2)).toDS
  df3.printSchema
  df3.show

  val df4 = df3.map(d => FooWithSomeOtherClass(d.a, d.b, new SomeOtherClass(d.bar)))
  df4.printSchema
  df4.show
  df4.collect

Run Code Online (Sandbox Code Playgroud)

我懂了。但这会将整个模式转换为单个二进制列。我宁愿有多个（原始列），其中单个列通过 kayo/binary 序列化。这也可能吗？我见过一些 java 项目显式序列化，例如“Encoders.tuple(Encoders.STRING(), Encoders.kryo(Mything.class)))”，这似乎提供了所需的行为。但是，我还不知道如何在 Scala api 中重现这一点。 (3认同)
至少对于部分所需功能来说这是可能的。然而，当考虑到它的可用性时，如果我可以将 kryo 应用于本身不是产品的单个字段，那就太好了。 (2认同)

归档时间：	8 年，7 月前
查看次数：	4715 次
最近记录：	8 年，7 月前