Spark SQL convert dataset to dataframe

Question

Spark SQL convert dataset to dataframe

Lea*_*oop 4 scala apache-spark apache-spark-sql

How do I convert a dataset obj to a dataframe? In my example, I am converting a JSON file to dataframe and converting to DataSet. In dataset, I have added some additional attribute(newColumn) and convert it back to a dataframe. Here is my example code:

val empData = sparkSession.read.option("header", "true").option("inferSchema", "true").option("multiline", "true").json(filePath)

Run Code Online (Sandbox Code Playgroud)

.....

 import sparkSession.implicits._
    val res = empData.as[Emp]

    //for (i <- res.take(4)) println(i.name + " ->" + i.newColumn)

    val s = res.toDF();

    s.printSchema()

  }
  case class Emp(name: String, gender: String, company: String, address: String) {
    val newColumn = if (gender == "male") "Not-allowed" else "Allowed"
  }

Run Code Online (Sandbox Code Playgroud)

But I am expected the new column name newColumn added in s.printschema(). output result. But it is not happening? Why? Any reason? How can I achieve this?

Answer 1

小智 5

的输出架构Product Encoder仅根据其构造函数签名确定。因此，体内发生的任何事情都将被简单地丢弃。

您可以

empData.map(x => (x, x.newColumn)).toDF("value", "newColumn")

Run Code Online (Sandbox Code Playgroud)

谢谢..最后的代码部分val r = res.map（s =>（s.name，s.gender，s.company，s.address，s.newColumn））。toDF（“ name”，“ gender”，“公司”，“地址”，“ newColumn”）；..是否有任何将参数传递给DF的快捷方式。如果类有更多的参数，那么很难提供所有值。有捷径 (3认同)

归档时间：	7 年，1 月前
查看次数：	4480 次
最近记录：	7 年，1 月前