cra*_*ray 4 hive hql apache-spark
我在Aginity Workbench上使用Netezza SQL并拥有以下数据:
id DATE1 DATE2
1 2013-07-27 NULL
2 NULL NULL
3 NULL 2013-08-02
4 2013-09-10 2013-09-23
5 2013-12-11 NULL
6 NULL 2013-12-19
Run Code Online (Sandbox Code Playgroud)
我需要用DATE1字段中填充的前面值填充DATE1中的所有NULL值.对于DATE2,我需要执行相同的操作,但顺序相反.所以我想要的输出如下:
id DATE1 DATE2
1 2013-07-27 2013-08-02
2 2013-07-27 2013-08-02
3 2013-07-27 2013-08-02
4 2013-09-10 2013-09-23
5 2013-12-11 2013-12-19
6 2013-12-11 2013-12-19
Run Code Online (Sandbox Code Playgroud)
我只能读取数据.因此创建表或视图是不可能的
怎么样这个?
select
id
,last_value(date1 ignore nulls) over (
order by id
rows between unbounded preceding and current row
) date1
,first_value(date2 ignore nulls) over (
order by id
rows between current row and unbounded following
) date2
Run Code Online (Sandbox Code Playgroud)
您也可以手动计算,而不是依赖窗口函数.
with chain as (
select
this.*,
prev.date1 prev_date1,
case when prev.date1 is not null then abs(this.id - prev.id) else null end prev_distance,
next.date2 next_date2,
case when next.date2 is not null then abs(this.id - next.id) else null end next_distance
from
Table1 this
left outer join Table1 prev on this.id >= prev.id
left outer join Table1 next on this.id <= next.id
), min_distance as (
select
id,
min(prev_distance) min_prev_distance,
min(next_distance) min_next_distance
from
chain
group by
id
)
select
chain.id,
chain.prev_date1,
chain.next_date2
from
chain
join min_distance on
min_distance.id = chain.id
and chain.prev_distance = min_distance.min_prev_distance
and chain.next_distance = min_distance.min_next_distance
order by chain.id
Run Code Online (Sandbox Code Playgroud)
如果您无法通过减法计算ID之间的距离,只需通过row_number()调用替换排序方案.
| 归档时间: |
|
| 查看次数: |
2188 次 |
| 最近记录: |