相关疑难解决方法(0)

为什么在创建自定义案例类的数据集时"无法找到存储在数据集中的类型的编码器"？

使用Scala 2.11.8的Spark 2.0(最终版).以下超级简单代码会产生编译错误Error:(17, 45) Unable to find encoder for type stored in a Dataset. Primitive types (Int, String, etc) and Product types (case classes) are supported by importing spark.implicits._ Support for serializing other types will be added in future releases.

import org.apache.spark.sql.SparkSession

case class SimpleTuple(id: Int, desc: String)

object DatasetTest {
  val dataList = List(
    SimpleTuple(5, "abc"),
    SimpleTuple(6, "bcd")
  )

  def main(args: Array[String]): Unit = {
    val sparkSession = SparkSession.builder.
      master("local")
      .appName("example")
      .getOrCreate()

    val dataset = sparkSession.createDataset(dataList)
  }
}

Run Code Online (Sandbox Code Playgroud)

scala apache-spark apache-spark-dataset apache-spark-encoders

cla*_*lay

2019 01-04

53
推荐指数

2
解决办法

5万
查看次数

使用案例类编码JSON时,为什么错误"无法找到存储在数据集中的类型的编码器"？

我写过火花工作:

object SimpleApp {
  def main(args: Array[String]) {
    val conf = new SparkConf().setAppName("Simple Application").setMaster("local")
    val sc = new SparkContext(conf)
    val ctx = new org.apache.spark.sql.SQLContext(sc)
    import ctx.implicits._

    case class Person(age: Long, city: String, id: String, lname: String, name: String, sex: String)
    case class Person2(name: String, age: Long, city: String)

    val persons = ctx.read.json("/tmp/persons.json").as[Person]
    persons.printSchema()
  }
}

Run Code Online (Sandbox Code Playgroud)

在IDE中运行main函数时,发生2错误:

Error:(15, 67) Unable to find encoder for type stored in a Dataset.  Primitive types (Int, String, etc) and Product types (case classes) are supported …

Run Code Online (Sandbox Code Playgroud)

scala apache-spark apache-spark-dataset apache-spark-encoders

Mil*_*avi

2019 01-04

15
推荐指数

1
解决办法

2万
查看次数