如何在bigquery中回填空值?

Ped*_*ato 1 sql google-bigquery google-cloud-platform

我正在尝试在 BigQuery 中执行空回填,类似于 Panda 的数据帧 bfill。阅读文档,该last_value函数似乎是一个不错的选择。然而,这会留下一些null位置,直到找到第一个值(考虑到函数的名称,这是相当合理的)。我怎样才能回填这些null?或者我必须放弃它们?

这是一个示例查询:

select table_path.*, last_value(sn_6 ignore nulls) over (order by time)
from (select 1 as time, null as sn_6 union all
      select 2, 1 union all
      select 3, null union all
      select 4, null union all
      select 5, null union all
      select 6, 0 union all
      select 7, null union all
      select 8, null
     ) table_path;
Run Code Online (Sandbox Code Playgroud)

实际输出:

time    sn_6    f0_
1       null   null
2         1     1
3       null    1
4       null    1
5       null    1
6         0     0
7       null    0
8       null    0
Run Code Online (Sandbox Code Playgroud)

期望的输出:

time    sn_6    f0_
1       null    1 <---Back fill all the gaps!
2         1     1
3       null    1
4       null    1
5       null    1
6         0     0
7       null    0
8       null    0
Run Code Online (Sandbox Code Playgroud)

真实数据有一timestamp​​列,后面有六float列,并且到处都有空值。

Yun*_*ang 5

如果目的是使缺失的“回填”成为“前向填充”,则可以使用first_value函数向前查找找到第一个非空值,如下所示:

select table_path.*, 
coalesce(
  last_value(sn_6 ignore nulls) over (order by time),
  first_value(sn_6 ignore nulls) over (order by time RANGE BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)
  )
from (select 1 as time, null as sn_6 union all
      select 2, 1 union all
      select 3, null union all
      select 4, null union all
      select 5, null union all
      select 6, 0 union all
      select 7, null union all
      select 8, null
     ) table_path;
Run Code Online (Sandbox Code Playgroud)