如何在SPARK数据帧v1.6中的左外连接中将NULL替换为0

Pra*_*san 5 scala apache-spark spark-dataframe apache-spark-1.6

我正在工作spark v1.6.我有以下两个数据帧,我想在我的左外连接结果集中将null转换为0.有什么建议?

DataFrames

val x:Array [Int] = Array(1,2,3)val df_sample_x = sc.parallelize(x).toDF("x")

val y:Array [Int] = Array(3,4,5)val df_sample_y = sc.parallelize(y).toDF("y")

左外连接

val df_sample_join = df_sample_x.join(df_sample_y,df_sample_x("x")=== df_sample_y("y"),"left_outer")

结果集

scala> df_sample_join.show

x | ÿ

1 | 空值

2 | 空值

3 | 3

但我希望结果集显示为.

scala> df_sample_join.show

x | ÿ

1 | 0

2 | 0

3 | 3

小智 9

只需使用na.fill:

df.na.fill(0, Seq("y"))
Run Code Online (Sandbox Code Playgroud)


T. *_*ęda 5

尝试:

val withReplacedNull = df_sample_join.withColumn("y", coalesce('y, lit(0)))
Run Code Online (Sandbox Code Playgroud)

经过测试:

import org.apache.spark.sql.Row
import org.apache.spark.sql.functions.{col, udf}
import org.apache.spark.sql.types._

val list = List(Row("a", null), Row("b", null), Row("c", 1));
val rdd = sc.parallelize(list);

val schema = StructType(
    StructField("text", StringType, false) ::
    StructField("y", IntegerType, false) :: Nil)

val df = sqlContext.createDataFrame(rdd, schema)
val df1 = df.withColumn("y", coalesce('y, lit(0)));
df1.show()
Run Code Online (Sandbox Code Playgroud)