如何在任何嵌套级别上向结构添加或替换字段?
这个输入:
val rdd = sc.parallelize(Seq(
"""{"a": {"xX": 1,"XX": 2},"b": {"z": 0}}""",
"""{"a": {"xX": 3},"b": {"z": 0}}""",
"""{"a": {"XX": 3},"b": {"z": 0}}""",
"""{"a": {"xx": 4},"b": {"z": 0}}"""))
var df = sqlContext.read.json(rdd)
Run Code Online (Sandbox Code Playgroud)
产生以下模式:
root
|-- a: struct (nullable = true)
| |-- XX: long (nullable = true)
| |-- xX: long (nullable = true)
| |-- xx: long (nullable = true)
|-- b: struct (nullable = true)
| |-- z: long (nullable = true)
Run Code Online (Sandbox Code Playgroud)
然后我可以这样做:
import org.apache.spark.sql.functions._
val overlappingNames = Seq(col("a.xx"), …Run Code Online (Sandbox Code Playgroud)