相关疑难解决方法(0)

在Apache Spark Join中包含空值

我想在Apache Spark连接中包含空值.Spark默认情况下不包含null的行.

这是默认的Spark行为.

val numbersDf = Seq(
  ("123"),
  ("456"),
  (null),
  ("")
).toDF("numbers")

val lettersDf = Seq(
  ("123", "abc"),
  ("456", "def"),
  (null, "zzz"),
  ("", "hhh")
).toDF("numbers", "letters")

val joinedDf = numbersDf.join(lettersDf, Seq("numbers"))
Run Code Online (Sandbox Code Playgroud)

这是输出joinedDf.show():

+-------+-------+
|numbers|letters|
+-------+-------+
|    123|    abc|
|    456|    def|
|       |    hhh|
+-------+-------+
Run Code Online (Sandbox Code Playgroud)

这是我想要的输出:

+-------+-------+
|numbers|letters|
+-------+-------+
|    123|    abc|
|    456|    def|
|       |    hhh|
|   null|    zzz|
+-------+-------+
Run Code Online (Sandbox Code Playgroud)

sql scala join apache-spark apache-spark-sql

38
推荐指数
3
解决办法
2万
查看次数

标签 统计

apache-spark ×1

apache-spark-sql ×1

join ×1

scala ×1

sql ×1