Apache Spark Dataframe Groupby agg()用于多列

Aka*_*thi 12 scala apache-spark spark-dataframe

我有DataFrame3列,即Id, First Name, Last Name

我想申请GroupBy的基础上,Id并希望收集First Name, Last Name列作为列表.

示例: - 我有这样的DF

+---+-------+--------+
|id |fName  |lName   |
+---+-------+--------+
|1  |Akash  |Sethi   |
|2  |Kunal  |Kapoor  |
|3  |Rishabh|Verma   |
|2  |Sonu   |Mehrotra|
+---+-------+--------+
Run Code Online (Sandbox Code Playgroud)

我希望我的输出像这样

+---+-------+--------+--------------------+
|id |fname           |lName               |
+---+-------+--------+--------------------+
|1  |[Akash]         |[Sethi]             |
|2  |[Kunal, Sonu]   |[Kapoor, Mehrotra]  |
|3  |[Rishabh]       |[Verma]             |
+---+-------+--------+--------------------+
Run Code Online (Sandbox Code Playgroud)

提前致谢

him*_*ian 12

您可以聚合多个列,如下所示:

df.groupBy("id").agg(collect_list("fName"), collect_list("lName"))
Run Code Online (Sandbox Code Playgroud)

它会给你预期的结果.