如何从给定的案例类创建行?

Mar*_*ace 5 scala apache-spark apache-spark-sql

假设您有以下案例类:

case class B(key: String, value: Int)
case class A(name: String, data: B)
Run Code Online (Sandbox Code Playgroud)

给定一个实例A,我如何创建一个 Spark Row?例如

val a = A("a", B("b", 0))
val row = ???
Run Code Online (Sandbox Code Playgroud)

注意:鉴于row我需要能够通过以下方式获取数据:

val name: String = row.getAs[String]("name")
val b: Row = row.getAs[Row]("data")
Run Code Online (Sandbox Code Playgroud)

Jac*_*ski 7

以下内容似乎与您要查找的内容相符。

scala> spark.version
res0: String = 2.3.0

scala> val a = A("a", B("b", 0))
a: A = A(a,B(b,0))

import org.apache.spark.sql.Encoders
val schema = Encoders.product[A].schema
scala> schema.printTreeString
root
 |-- name: string (nullable = true)
 |-- data: struct (nullable = true)
 |    |-- key: string (nullable = true)
 |    |-- value: integer (nullable = false)

val values = a.productIterator.toSeq.toArray

import org.apache.spark.sql.Row
import org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema
val row: Row = new GenericRowWithSchema(values, schema)

scala> val name: String = row.getAs[String]("name")
name: String = a

// the following won't work since B =!= Row
scala> val b: Row = row.getAs[Row]("data")
java.lang.ClassCastException: B cannot be cast to org.apache.spark.sql.Row
  ... 55 elided
Run Code Online (Sandbox Code Playgroud)


Rap*_*oth 1

非常短,但可能不是最快的,因为它首先创建一个数据帧,然后再次收集它:

import session.implicits._
val row = Seq(a).toDF().first()
Run Code Online (Sandbox Code Playgroud)