小编xst*_*000的帖子

Spark 2 Dataset Null值异常

在spark Dataset.filter中获取此null错误

输入CSV:

name,age,stat
abc,22,m
xyz,,s
Run Code Online (Sandbox Code Playgroud)

工作代码:

case class Person(name: String, age: Long, stat: String)

val peopleDS = spark.read.option("inferSchema","true")
  .option("header", "true").option("delimiter", ",")
  .csv("./people.csv").as[Person]
peopleDS.show()
peopleDS.createOrReplaceTempView("people")
spark.sql("select * from people where age > 30").show()
Run Code Online (Sandbox Code Playgroud)

失败的代码(添加以下行返回错误):

val filteredDS = peopleDS.filter(_.age > 30)
filteredDS.show()
Run Code Online (Sandbox Code Playgroud)

返回null错误

java.lang.RuntimeException: Null value appeared in non-nullable field:
- field (class: "scala.Long", name: "age")
- root class: "com.gcp.model.Person"
If the schema is inferred from a Scala tuple/case class, or a Java bean, please try to use scala.Option[_] …
Run Code Online (Sandbox Code Playgroud)

scala apache-spark apache-spark-sql apache-spark-dataset

9
推荐指数
1
解决办法
5929
查看次数