ta.*_*ta. 7 scala apache-spark apache-spark-sql
我有地图的RDD,我想将其转换为数据帧这是RDD的输入格式
val mapRDD: RDD[Map[String, String]] = sc.parallelize(Seq(
Map("empid" -> "12", "empName" -> "Rohan", "depId" -> "201"),
Map("empid" -> "13", "empName" -> "Ross", "depId" -> "201"),
Map("empid" -> "14", "empName" -> "Richard", "depId" -> "401"),
Map("empid" -> "15", "empName" -> "Michale", "depId" -> "501"),
Map("empid" -> "16", "empName" -> "John", "depId" -> "701")))
Run Code Online (Sandbox Code Playgroud)
有没有办法转换成数据帧,如
val df=mapRDD.toDf
Run Code Online (Sandbox Code Playgroud)
df.show
empid, empName, depId
12 Rohan 201
13 Ross 201
14 Richard 401
15 Michale 501
16 John 701
Run Code Online (Sandbox Code Playgroud)
Shi*_*nsh 13
您可以轻松将其转换为Spark DataFrame:
这是一个可以解决问题的代码:
val mapRDD= sc.parallelize(Seq(
Map("empid" -> "12", "empName" -> "Rohan", "depId" -> "201"),
Map("empid" -> "13", "empName" -> "Ross", "depId" -> "201"),
Map("empid" -> "14", "empName" -> "Richard", "depId" -> "401"),
Map("empid" -> "15", "empName" -> "Michale", "depId" -> "501"),
Map("empid" -> "16", "empName" -> "John", "depId" -> "701")))
val columns=mapRDD.take(1).flatMap(a=>a.keys)
val resultantDF=mapRDD.map{value=>
val list=value.values.toList
(list(0),list(1),list(2))
}.toDF(columns:_*)
resultantDF.show()
Run Code Online (Sandbox Code Playgroud)
输出是:
+-----+-------+-----+
|empid|empName|depId|
+-----+-------+-----+
| 12| Rohan| 201|
| 13| Ross| 201|
| 14|Richard| 401|
| 15|Michale| 501|
| 16| John| 701|
+-----+-------+-----+
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
5603 次 |
| 最近记录: |