Lea*_*oop 4 scala apache-spark apache-spark-sql
How do I convert a dataset obj to a dataframe? In my example, I am converting a JSON file to dataframe and converting to DataSet. In dataset, I have added some additional attribute(newColumn
) and convert it back to a dataframe. Here is my example code:
val empData = sparkSession.read.option("header", "true").option("inferSchema", "true").option("multiline", "true").json(filePath)
Run Code Online (Sandbox Code Playgroud)
.....
import sparkSession.implicits._
val res = empData.as[Emp]
//for (i <- res.take(4)) println(i.name + " ->" + i.newColumn)
val s = res.toDF();
s.printSchema()
}
case class Emp(name: String, gender: String, company: String, address: String) {
val newColumn = if (gender == "male") "Not-allowed" else "Allowed"
}
Run Code Online (Sandbox Code Playgroud)
But I am expected the new column name newColumn
added in s.printschema()
. output result. But it is not happening? Why? Any reason? How can I achieve this?
小智 5
的输出架构Product
Encoder
仅根据其构造函数签名确定。因此,体内发生的任何事情都将被简单地丢弃。
您可以
empData.map(x => (x, x.newColumn)).toDF("value", "newColumn")
Run Code Online (Sandbox Code Playgroud)
归档时间: |
|
查看次数: |
4480 次 |
最近记录: |