标签: scala-spark

将一种类型的 Spark scala 数据集转换为另一种类型

我有一个具有以下案例类类型的数据集:

  case class AddressRawData(
                         addressId: String,
                         customerId: String,
                         address: String
                       )
Run Code Online (Sandbox Code Playgroud)

我想将其转换为:

case class AddressData(
                          addressId: String,
                          customerId: String,
                          address: String,
                          number: Option[Int], //i.e. it is optional
                          road: Option[String],
                          city: Option[String],
                          country: Option[String]
                        )
Run Code Online (Sandbox Code Playgroud)

使用解析器函数:

  def addressParser(unparsedAddress: Seq[AddressData]): Seq[AddressData] = {
    unparsedAddress.map(address => {
      val split = address.address.split(", ")
      address.copy(
        number = Some(split(0).toInt),
        road = Some(split(1)),
        city = Some(split(2)),
        country = Some(split(3))
      )
    }
    )
  }
Run Code Online (Sandbox Code Playgroud)

我是 Scala 和 Spark 的新手。谁能告诉我如何做到这一点?

scala apache-spark apache-spark-sql scala-spark

0
推荐指数
1
解决办法
434
查看次数