Joh*_*ohn 18 sql scala dataframe apache-spark apache-spark-sql
当我们使用时,我对这种差异感到困惑
df.filter(col("c1") === null) and df.filter(col("c1").isNull)
Run Code Online (Sandbox Code Playgroud)
相同的数据帧我在=== null中得到计数,但在isNull中计数为零.请帮我理解其中的区别.谢谢
use*_*411 34
首先,null除非出于兼容性原因,否则不要在Scala代码中使用.
关于你的问题,这是一个简单的SQL.col("c1") === null被解释为c1 = NULL和,因为NULL标记未定义的值,结果是未定义的包括NULL其自身的任何值.
spark.sql("SELECT NULL = NULL").show
Run Code Online (Sandbox Code Playgroud)
+-------------+
|(NULL = NULL)|
+-------------+
| null|
+-------------+
Run Code Online (Sandbox Code Playgroud)
spark.sql("SELECT NULL != NULL").show
Run Code Online (Sandbox Code Playgroud)
+-------------------+
|(NOT (NULL = NULL))|
+-------------------+
| null|
+-------------------+
Run Code Online (Sandbox Code Playgroud)
spark.sql("SELECT TRUE != NULL").show
Run Code Online (Sandbox Code Playgroud)
+------------------------------------+
|(NOT (true = CAST(NULL AS BOOLEAN)))|
+------------------------------------+
| null|
+------------------------------------+
Run Code Online (Sandbox Code Playgroud)
spark.sql("SELECT TRUE = NULL").show
Run Code Online (Sandbox Code Playgroud)
+------------------------------+
|(true = CAST(NULL AS BOOLEAN))|
+------------------------------+
| null|
+------------------------------+
Run Code Online (Sandbox Code Playgroud)
要检查的唯一有效方法NULL是:
IS NULL:
spark.sql("SELECT NULL IS NULL").show
Run Code Online (Sandbox Code Playgroud)
+--------------+
|(NULL IS NULL)|
+--------------+
| true|
+--------------+
Run Code Online (Sandbox Code Playgroud)
spark.sql("SELECT TRUE IS NULL").show
Run Code Online (Sandbox Code Playgroud)
+--------------+
|(true IS NULL)|
+--------------+
| false|
+--------------+
Run Code Online (Sandbox Code Playgroud)IS NOT NULL:
spark.sql("SELECT NULL IS NOT NULL").show
Run Code Online (Sandbox Code Playgroud)
+------------------+
|(NULL IS NOT NULL)|
+------------------+
| false|
+------------------+
Run Code Online (Sandbox Code Playgroud)
spark.sql("SELECT TRUE IS NOT NULL").show
Run Code Online (Sandbox Code Playgroud)
+------------------+
|(true IS NOT NULL)|
+------------------+
| true|
+------------------+
Run Code Online (Sandbox Code Playgroud)中实现DataFrameDSL作为Column.isNull和Column.isNotNull分别.
注意:
对于NULL-safe比较使用IS DISTINCT/ IS NOT DISTINCT:
spark.sql("SELECT NULL IS NOT DISTINCT FROM NULL").show
Run Code Online (Sandbox Code Playgroud)
+---------------+
|(NULL <=> NULL)|
+---------------+
| true|
+---------------+
Run Code Online (Sandbox Code Playgroud)
spark.sql("SELECT NULL IS NOT DISTINCT FROM TRUE").show
Run Code Online (Sandbox Code Playgroud)
+--------------------------------+
|(CAST(NULL AS BOOLEAN) <=> true)|
+--------------------------------+
| false|
+--------------------------------+
Run Code Online (Sandbox Code Playgroud)
或 not(_ <=> _)/<=>
spark.sql("SELECT NULL AS col1, NULL AS col2").select($"col1" <=> $"col2").show
Run Code Online (Sandbox Code Playgroud)
+---------------+
|(col1 <=> col2)|
+---------------+
| true|
+---------------+
Run Code Online (Sandbox Code Playgroud)
spark.sql("SELECT NULL AS col1, TRUE AS col2").select($"col1" <=> $"col2").show
Run Code Online (Sandbox Code Playgroud)
+---------------+
|(col1 <=> col2)|
+---------------+
| false|
+---------------+
Run Code Online (Sandbox Code Playgroud)
DataFrame分别在SQL和DSL中.
相关:
| 归档时间: |
|
| 查看次数: |
27487 次 |
| 最近记录: |