Spark SQL convert dataset to dataframe

Lea*_*oop 4 scala apache-spark apache-spark-sql

How do I convert a dataset obj to a dataframe? In my example, I am converting a JSON file to dataframe and converting to DataSet. In dataset, I have added some additional attribute(newColumn) and convert it back to a dataframe. Here is my example code:

val empData = sparkSession.read.option("header", "true").option("inferSchema", "true").option("multiline", "true").json(filePath)
Run Code Online (Sandbox Code Playgroud)

.....

 import sparkSession.implicits._
    val res = empData.as[Emp]

    //for (i <- res.take(4)) println(i.name + " ->" + i.newColumn)

    val s = res.toDF();

    s.printSchema()

  }
  case class Emp(name: String, gender: String, company: String, address: String) {
    val newColumn = if (gender == "male") "Not-allowed" else "Allowed"
  }
Run Code Online (Sandbox Code Playgroud)

But I am expected the new column name newColumn added in s.printschema(). output result. But it is not happening? Why? Any reason? How can I achieve this?

小智 5

的输出架构Product Encoder仅根据其构造函数签名确定。因此,体内发生的任何事情都将被简单地丢弃。

您可以

empData.map(x => (x, x.newColumn)).toDF("value", "newColumn")
Run Code Online (Sandbox Code Playgroud)

  • 谢谢..最后的代码部分val r = res.map(s =&gt;(s.name,s.gender,s.company,s.address,s.newColumn))。toDF(“ name”,“ gender”,“公司”,“地址”,“ newColumn”);..是否有任何将参数传递给DF的快捷方式。如果类有更多的参数,那么很难提供所有值。有捷径 (3认同)