想与你讨论/讨论BigQuery中的空值行为.
我注意到在NULLABLE列中过滤掉实际值会导致过滤掉请求的值和NULL值.
以下查询:
select * from
(select NULL as some_nullable_col, "name1" as name),
(select 4 as some_nullable_col, "name2" as name),
(select 1 as some_nullable_col, "name3" as name),
(select 7 as some_nullable_col, "name4" as name),
(select 3 as some_nullable_col, "name5" as name)
--WHERE some_nullable_col != 3
Run Code Online (Sandbox Code Playgroud)
所有结果均按预期返回,
然后:
select * from
(select NULL as some_nullable_col, "name1" as name),
(select 4 as some_nullable_col, "name2" as name),
(select 1 as some_nullable_col, "name3" as name),
(select 7 as some_nullable_col, "name4" as name),
(select 3 as some_nullable_col, "name5" as name)
WHERE some_nullable_col != 3
Run Code Online (Sandbox Code Playgroud)
将省略2列.值3和null.
我想这是因为BigQuery不会索引空值/不会在where子句上扫描空值以提高效率,但它也会带来麻烦:
每次我在可空列上过滤时,过滤器看起来都像
WHERE some_nullable_col != 3 OR some_nullable_col IS NULL
这显然不太舒服.
只是想得到一个解释/ BigQuery的路线图是否为这个问题提供了解决方案?
这是SQL中NULL的标准行为,并且所有SQL数据库(Oracle,Microsoft SQL Server,PostgreSQL,MySQL等)都具有完全相同的行为.如果IS NULL检查太繁琐,则替代解决方案是使用IFNULL或COALESCE函数将NULL转换为非NULL,即
select * from
(select NULL as some_nullable_col, "name1" as name),
(select 4 as some_nullable_col, "name2" as name),
(select 1 as some_nullable_col, "name3" as name),
(select 7 as some_nullable_col, "name4" as name),
(select 3 as some_nullable_col, "name5" as name)
WHERE ifnull(some_nullable_col,0) != 3
Run Code Online (Sandbox Code Playgroud)
是的,你是对的,NULL 与比较器不匹配,例如some_nullable_col != 3. 让我解释一下原因。
Google 使用键值存储作为 BigQuery 的基础数据存储。与传统的关系数据库不同,数据按行和字段进行碎片化并存储到许多不同的位置。如果数据为 NULL,BigQuery 会认为该数据不存在,因此不会将任何内容写入数据存储。因此,该字段永远不会与除“IS NULL”之外的任何比较器匹配。这是设计使然,谷歌目前没有任何计划改变其工作方式。
解决方法是为这些字段设置特殊值。例如,如果该字段的类型是字符串,那么您可以使用空字符串“”而不是NULL。如果字段类型为非负整数,可以使用“-1”作为特殊值。我知道这并不是真正的最佳选择,在许多情况下,在查询中添加“IS NULL”语句可能会更好。这只是为了给您另一个选择。
顺便说一句,我在我的 MySQL 实例上尝试了类似的操作,它的行为方式与 BigQuery 相同。即查询不会返回带有“=!”的NULL记录 比较器。
例如,
mysql> select * from test1;
+------+------------+
| id | num |
+------+------------+
| 0 | aaa |
| 1 | bbb |
| 8 | sdfsdfgsdf |
| 9 | NULL |
| NULL | sdfsdfsfsf |
+------+------------+
5 rows in set (0.19 sec)
Run Code Online (Sandbox Code Playgroud)
和
mysql> select * from test1 where id != 8;
+------+------+
| id | num |
+------+------+
| 0 | aaa |
| 1 | bbb |
| 9 | NULL |
+------+------+
3 rows in set (0.18 sec)
Run Code Online (Sandbox Code Playgroud)
所以我认为这是 SQL 世界中的标准行为。
| 归档时间: |
|
| 查看次数: |
3551 次 |
| 最近记录: |