Spark 2 Dataset Null值异常

xst*_*000 9 scala apache-spark apache-spark-sql apache-spark-dataset

在spark Dataset.filter中获取此null错误

输入CSV:

name,age,stat
abc,22,m
xyz,,s
Run Code Online (Sandbox Code Playgroud)

工作代码:

case class Person(name: String, age: Long, stat: String)

val peopleDS = spark.read.option("inferSchema","true")
  .option("header", "true").option("delimiter", ",")
  .csv("./people.csv").as[Person]
peopleDS.show()
peopleDS.createOrReplaceTempView("people")
spark.sql("select * from people where age > 30").show()
Run Code Online (Sandbox Code Playgroud)

失败的代码(添加以下行返回错误):

val filteredDS = peopleDS.filter(_.age > 30)
filteredDS.show()
Run Code Online (Sandbox Code Playgroud)

返回null错误

java.lang.RuntimeException: Null value appeared in non-nullable field:
- field (class: "scala.Long", name: "age")
- root class: "com.gcp.model.Person"
If the schema is inferred from a Scala tuple/case class, or a Java bean, please try to use scala.Option[_] or other nullable types (e.g. java.lang.Integer instead of int/scala.Int).
Run Code Online (Sandbox Code Playgroud)

use*_*411 19

你得到的例外应该解释一切,但让我们一步一步走:

相关的Spark 2.0 Dataset与DataFrame相关