gay*_*hri 1 scala apache-spark apache-spark-1.6
我有一个数据帧
|--id:string (nullable = true)
|--ddd:struct (nullable = true)
|-- aaa: string (nullable = true)
|-- bbb: long(nullable = true)
|-- ccc: string (nullable = true)
|-- eee: long(nullable = true)
Run Code Online (Sandbox Code Playgroud)
我有这样的输出
id | ddd
--------------------------
1 | [hi,1,this,2]
2 | [hello,6,good,3]
1 | [hru,2,where,7]
3 | [in,4,you,1]
2 | [how,4,to,3]
Run Code Online (Sandbox Code Playgroud)
我希望预期的o/p为:
id | ddd
--------------------
1 | [hi,1,this,2],[hru,2,where,7]
2 | [hello,6,good,3],[how,4,to,3]
3 | [in,4,you,1]
Run Code Online (Sandbox Code Playgroud)
请帮忙
你可以collect_list如下
import org.apache.spark.sql.functions._
df.groupBy("id").agg(collect_list("ddd").as("ddd"))
Run Code Online (Sandbox Code Playgroud)
collect_set 也有效
df.groupBy("id").agg(collect_set("ddd").as("ddd"))
Run Code Online (Sandbox Code Playgroud)