Pet*_*711 1 hadoop hive hiveql
我有四列
date number Estimate Client
---- ------
1 3 10 A
2 NULL 10 Null
3 5 10 A
4 NULL 10 Null
5 NULL 10 Null
6 2 10 A
.......
Run Code Online (Sandbox Code Playgroud)
我需要用新值替换NULL值,并采用日期列中前一个日期中最后一个已知值的值,例如:date = 2 number = 3,date 4和5 number = 5和5。出现NULL值随机地。
这需要在Hive中完成。
关于滑动窗;
这是我的桌子内容;
hive> select * from my_table;
OK
1 3 10 A
2 NULL 10 NULL
3 5 10 A
4 NULL 10 NULL
5 NULL 10 NULL
6 2 10 A
Time taken: 0.06 seconds, Fetched: 6 row(s)
Run Code Online (Sandbox Code Playgroud)
您需要做的就是在先行和当前行之间的窗口上滑动,找到最近的非空值。LAST_VALUEwindowable函数具有一个参数,可将空值忽略为布尔值。LAST_VALUE(<field>,<ignore_nulls> as boolean);
SELECT
COALESCE(`date`, LAST_VALUE(`date`, TRUE) OVER(ORDER BY `date` ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)),
COALESCE(number, LAST_VALUE(number, TRUE) OVER(ORDER BY `date` ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)),
COALESCE(estimate, LAST_VALUE(estimate, TRUE) OVER(ORDER BY `date` ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)),
COALESCE(client, LAST_VALUE(client, TRUE) OVER(ORDER BY `date` ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW))
FROM my_table;
Run Code Online (Sandbox Code Playgroud)
结果将是;
OK
1 3 10 A
2 3 10 A
3 5 10 A
4 5 10 A
5 5 10 A
6 2 10 A
Time taken: 19.177 seconds, Fetched: 6 row(s)
Run Code Online (Sandbox Code Playgroud)