sam*_*mol 7 apache-spark apache-spark-sql
上下文
sqlContext.sql(s"""
SELECT
school_name,
name,
age
FROM my_table
""")
Run Code Online (Sandbox Code Playgroud)
问
鉴于上表,我想按学校名称分组并收集姓名,年龄为a Map[String, Int]
例如 - 伪代码
val df = sqlContext.sql(s"""
SELECT
school_name,
age
FROM my_table
GROUP BY school_name
""")
------------------------
school_name | name | age
------------------------
school A | "michael"| 7
school A | "emily" | 5
school B | "cathy" | 10
school B | "shaun" | 5
df.groupBy("school_name").agg(make_map)
------------------------------------
school_name | map
------------------------------------
school A | {"michael": 7, "emily": 5}
school B | {"cathy": 10, "shaun": 5}
Run Code Online (Sandbox Code Playgroud)
aba*_*hel 18
以下将与Spark 2.0一起使用.您可以使用自2.0版本以来可用的映射函数将列作为Map.
val df1 = df.groupBy(col("school_name")).agg(collect_list(map($"name",$"age")) as "map")
df1.show(false)
Run Code Online (Sandbox Code Playgroud)
这将给你低于输出.
+-----------+------------------------------------+
|school_name|map |
+-----------+------------------------------------+
|school B |[Map(cathy -> 10), Map(shaun -> 5)] |
|school A |[Map(michael -> 7), Map(emily -> 5)]|
+-----------+------------------------------------+
Run Code Online (Sandbox Code Playgroud)
现在,您可以使用UDF将单个地图加入单个地图,如下所示.
import org.apache.spark.sql.functions.udf
val joinMap = udf { values: Seq[Map[String,Int]] => values.flatten.toMap }
val df2 = df1.withColumn("map", joinMap(col("map")))
df2.show(false)
Run Code Online (Sandbox Code Playgroud)
这将提供所需的输出Map[String,Int].
+-----------+-----------------------------+
|school_name|map |
+-----------+-----------------------------+
|school B |Map(cathy -> 10, shaun -> 5) |
|school A |Map(michael -> 7, emily -> 5)|
+-----------+-----------------------------+
Run Code Online (Sandbox Code Playgroud)
如果要将列值转换为JSON String,则Spark 2.1.0引入了to_json函数.
val df3 = df2.withColumn("map",to_json(struct($"map")))
df3.show(false)
Run Code Online (Sandbox Code Playgroud)
该to_json函数将返回以下输出.
+-----------+-------------------------------+
|school_name|map |
+-----------+-------------------------------+
|school B |{"map":{"cathy":10,"shaun":5}} |
|school A |{"map":{"michael":7,"emily":5}}|
+-----------+-------------------------------+
Run Code Online (Sandbox Code Playgroud)
小智 9
从 spark 2.4 开始,您可以使用map_from_arrays函数来实现这一点。
val df = spark.sql(s"""
SELECT *
FROM VALUES ('s1','a',1),('s1','b',2),('s2','a',1)
AS (school, name, age)
""")
val df2 = df.groupBy("school").agg(map_from_arrays(collect_list($"name"), collect_list($"age")).as("map"))
+------+----+---+
|school|name|age|
+------+----+---+
| s1| a| 1|
| s1| b| 2|
| s2| a| 1|
+------+----+---+
+------+----------------+
|school| map|
+------+----------------+
| s2| [a -> 1]|
| s1|[a -> 1, b -> 2]|
+------+----------------+
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
8535 次 |
| 最近记录: |