Pra*_*san 5 scala apache-spark spark-dataframe apache-spark-1.6
我正在工作spark v1.6.我有以下两个数据帧,我想在我的左外连接结果集中将null转换为0.有什么建议?
val x:Array [Int] = Array(1,2,3)val df_sample_x = sc.parallelize(x).toDF("x")
val y:Array [Int] = Array(3,4,5)val df_sample_y = sc.parallelize(y).toDF("y")
val df_sample_join = df_sample_x.join(df_sample_y,df_sample_x("x")=== df_sample_y("y"),"left_outer")
scala> df_sample_join.show
1 | 空值
2 | 空值
3 | 3
scala> df_sample_join.show
1 | 0
2 | 0
3 | 3
尝试:
val withReplacedNull = df_sample_join.withColumn("y", coalesce('y, lit(0)))
Run Code Online (Sandbox Code Playgroud)
经过测试:
import org.apache.spark.sql.Row
import org.apache.spark.sql.functions.{col, udf}
import org.apache.spark.sql.types._
val list = List(Row("a", null), Row("b", null), Row("c", 1));
val rdd = sc.parallelize(list);
val schema = StructType(
StructField("text", StringType, false) ::
StructField("y", IntegerType, false) :: Nil)
val df = sqlContext.createDataFrame(rdd, schema)
val df1 = df.withColumn("y", coalesce('y, lit(0)));
df1.show()
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
6464 次 |
| 最近记录: |