Vik*_*ane 3 pivot scala apache-spark apache-spark-sql
我是Spark-SQL的新手。我在Spark Dataframe中有这样的信息
Company Type Status
A X done
A Y done
A Z done
C X done
C Y done
B Y done
Run Code Online (Sandbox Code Playgroud)
我想像下面这样显示
Company X-type Y-type Z-type
A done done done
B pending done pending
C done done pending
Run Code Online (Sandbox Code Playgroud)
我无法实现这是Spark-SQL
请帮忙
您可以groupby 公司,然后pivot在列类型上使用功能
这是简单的例子
import org.apache.spark.sql.functions._
val df = spark.sparkContext.parallelize(Seq(
("A", "X", "done"),
("A", "Y", "done"),
("A", "Z", "done"),
("C", "X", "done"),
("C", "Y", "done"),
("B", "Y", "done")
)).toDF("Company", "Type", "Status")
val result = df.groupBy("Company")
.pivot("Type")
.agg(expr("coalesce(first(Status), \"pending\")"))
result.show()
Run Code Online (Sandbox Code Playgroud)
输出:
+-------+-------+----+-------+
|Company| X| Y| Z|
+-------+-------+----+-------+
| B|pending|done|pending|
| C| done|done|pending|
| A| done|done| done|
+-------+-------+----+-------+
Run Code Online (Sandbox Code Playgroud)
您可以稍后重命名该列。
希望这可以帮助!
| 归档时间: |
|
| 查看次数: |
5827 次 |
| 最近记录: |