Scala Spark-Map（String，Int）的DataFrame列上的空地图

Question

Scala Spark-Map（String，Int）的DataFrame列上的空地图

Lou*_*_Ds 2 dictionary scala dataframe apache-spark

我正在加入两个DataFrame，其中有一种类型的列 Map[String, Int]

我希望合并的DF有一个空的映射，[]而不是null在Map类型列上。

val df = dfmerged.
  .select("id"),
          coalesce(col("map_1"), lit(null).cast(MapType(StringType, IntType))).alias("map_1"),
          coalesce(col("map_2"), lit(Map.empty[String, Int])).alias("map_2")

Run Code Online (Sandbox Code Playgroud)

对于map_1列，null将插入a，但是我想使用一个空的map map_2给我一个错误：

java.lang.RuntimeException：不支持的文字类型类scala.collection.immutable.Map $ EmptyMap $ Map（）

我也尝试过使用udf类似的功能：

case class myStructMap(x:Map[String, Int])
val emptyMap = udf(() => myStructMap(Map.empty[String, Int]))

Run Code Online (Sandbox Code Playgroud)

也没有用。

当我尝试类似的东西：

.select( coalesce(col("myMapCol"), lit(map())).alias("brand_viewed_count")...

要么

.select(coalesce(col("myMapCol"), lit(map().cast(MapType(LongType, LongType)))).alias("brand_viewed_count")...

我得到错误：

由于数据类型不匹配而无法解析“ map（）”：无法将MapType（NullType，NullType，false）强制转换为MapType（LongType，IntType，true）；

Answer 1

hi-*_*zir 6

在Spark 2.2中

import org.apache.spark.sql.functions.typedLit

val df = Seq((1L, null), (2L, Map("foo" -> "bar"))).toDF("id", "map")

df.withColumn("map", coalesce($"map", typedLit(Map[String, Int]()))).show
// +---+-----------------+
// | id|              map|
// +---+-----------------+
// |  1|            Map()|
// |  2|Map(foobar -> 42)|
// +---+-----------------+

Run Code Online (Sandbox Code Playgroud)

之前

df.withColumn("map", coalesce($"map", map().cast("map<string,int>"))).show
// +---+-----------------+
// | id|              map|
// +---+-----------------+
// |  1|            Map()|
// |  2|Map(foobar -> 42)|
// +---+-----------------+

Run Code Online (Sandbox Code Playgroud)

归档时间：	8 年，3 月前
查看次数：	1611 次
最近记录：	8 年前