SPARK_VERSION = 2.2.0
我在尝试对filter具有使用 UDF 添加的列的数据框执行 a 时遇到了一个有趣的问题。我能够用较小的数据集复制问题。
鉴于虚拟案例类:
case class Info(number: Int, color: String)
case class Record(name: String, infos: Seq[Info])
Run Code Online (Sandbox Code Playgroud)
以及以下数据:
val blue = Info(1, "blue")
val black = Info(2, "black")
val yellow = Info(3, "yellow")
val orange = Info(4, "orange")
val white = Info(5, "white")
val a = Record("a", Seq(blue, black, white))
val a2 = Record("a", Seq(yellow, white, orange))
val b = Record("b", Seq(blue, black))
val c = Record("c", Seq(white, orange))
val d = Record("d", Seq(orange, …Run Code Online (Sandbox Code Playgroud)