从Hive中的最后一个非空值填充空值

Pet*_*711 1 hadoop hive hiveql

我有四列

date   number   Estimate   Client    
----   ------
1      3          10        A 
2      NULL       10        Null
3      5          10        A      
4      NULL       10        Null 
5      NULL       10        Null
6      2          10        A   
.......
Run Code Online (Sandbox Code Playgroud)

我需要用新值替换NULL值,并采用日期列中前一个日期中最后一个已知值的值,例如:date = 2 number = 3,date 4和5 number = 5和5。出现NULL值随机地。

这需要在Hive中完成。

Ahm*_*DAL 5

关于滑动窗;

这是我的桌子内容;

hive> select * from my_table;
OK
1       3       10      A
2       NULL    10      NULL
3       5       10      A
4       NULL    10      NULL
5       NULL    10      NULL
6       2       10      A
Time taken: 0.06 seconds, Fetched: 6 row(s)
Run Code Online (Sandbox Code Playgroud)

您需要做的就是在先行和当前行之间的窗口上滑动,找到最近的非空值。LAST_VALUEwindowable函数具有一个参数,可将空值忽略为布尔值。LAST_VALUE(<field>,<ignore_nulls> as boolean);

SELECT
    COALESCE(`date`, LAST_VALUE(`date`, TRUE) OVER(ORDER BY `date` ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)),
    COALESCE(number, LAST_VALUE(number, TRUE) OVER(ORDER BY `date` ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)),
    COALESCE(estimate, LAST_VALUE(estimate, TRUE) OVER(ORDER BY `date` ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)),
    COALESCE(client, LAST_VALUE(client, TRUE) OVER(ORDER BY `date` ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW))
FROM my_table;
Run Code Online (Sandbox Code Playgroud)

结果将是;

OK
1       3       10      A
2       3       10      A
3       5       10      A
4       5       10      A
5       5       10      A
6       2       10      A
Time taken: 19.177 seconds, Fetched: 6 row(s)
Run Code Online (Sandbox Code Playgroud)