小编juh*_*tio的帖子

将嵌套列添加到Spark DataFrame

如何在任何嵌套级别上向结构添加或替换字段?

这个输入:

val rdd = sc.parallelize(Seq(
  """{"a": {"xX": 1,"XX": 2},"b": {"z": 0}}""",
  """{"a": {"xX": 3},"b": {"z": 0}}""",
  """{"a": {"XX": 3},"b": {"z": 0}}""",
  """{"a": {"xx": 4},"b": {"z": 0}}"""))
var df = sqlContext.read.json(rdd)
Run Code Online (Sandbox Code Playgroud)

产生以下模式:

root
 |-- a: struct (nullable = true)
 |    |-- XX: long (nullable = true)
 |    |-- xX: long (nullable = true)
 |    |-- xx: long (nullable = true)
 |-- b: struct (nullable = true)
 |    |-- z: long (nullable = true)
Run Code Online (Sandbox Code Playgroud)

然后我可以这样做:

import org.apache.spark.sql.functions._
val overlappingNames = Seq(col("a.xx"), …
Run Code Online (Sandbox Code Playgroud)

scala apache-spark apache-spark-sql spark-dataframe

9
推荐指数
1
解决办法
3683
查看次数