在spark数据帧中减去两列为空的列

war*_*ner 2 scala apache-spark apache-spark-sql

我刚起步,我有数据框df:

+----------+------------+-----------+
| Column1  | Column2    | Sub       |                          
+----------+------------+-----------+
| 1        | 2          | 1         |                                         
+----------+------------+-----------+
| 4        | null       | null      |                          
+----------+------------+-----------+
| 5        | null       | null      |                          
+----------+------------+-----------+
| 6        | 8          | 2         |                          
+----------+------------+-----------+
Run Code Online (Sandbox Code Playgroud)

当减去两列时,一列为null,因此结果列也为null。

df.withColumn("Sub", col(A)-col(B))
Run Code Online (Sandbox Code Playgroud)

预期输出应为:

+----------+------------+-----------+
|  Column1 | Column2    | Sub       |                          
+----------+------------+-----------+
| 1        | 2          | 1         |                                           
+----------+------------+-----------+
| 4        | null       | 4         |                          
+----------+------------+-----------+
| 5        | null       | 5         |                          
+----------+------------+-----------+
| 6        | 8          | 2         |                          
+----------+------------+-----------+
Run Code Online (Sandbox Code Playgroud)

我不想将column2替换为0,它仅应为null。有人可以帮我吗?

Ram*_*jan 7

您可以将when功能用作

import org.apache.spark.sql.functions._
df.withColumn("Sub", when(col("Column1").isNull, lit(0)).otherwise(col("Column1")) - when(col("Column2").isNull, lit(0)).otherwise(col("Column2")))
Run Code Online (Sandbox Code Playgroud)

你应该有最终结果

+-------+-------+----+
|Column1|Column2| Sub|
+-------+-------+----+
|      1|      2|-1.0|
|      4|   null| 4.0|
|      5|   null| 5.0|
|      6|      8|-2.0|
+-------+-------+----+
Run Code Online (Sandbox Code Playgroud)


Psi*_*dom 5

您可以coalesce将两列都清零,然后进行减法:

val df = Seq((Some(1), Some(2)), 
             (Some(4), null), 
             (Some(5), null), 
             (Some(6), Some(8))
            ).toDF("A", "B")

df.withColumn("Sub", abs(coalesce($"A", lit(0)) - coalesce($"B", lit(0)))).show
+---+----+---+
|  A|   B|Sub|
+---+----+---+
|  1|   2|  1|
|  4|null|  4|
|  5|null|  5|
|  6|   8|  2|
+---+----+---+
Run Code Online (Sandbox Code Playgroud)