IDK*_*IDK 4 sorting hadoop hive
我有两列,一列是产品,另一列是购买日期。我可以通过应用 sort_array(dates) 函数来对日期进行排序,但我希望能够按购买日期对 sort_array(products) 进行排序。有没有办法在 Hive 中做到这一点?
表名是
ClientID Product Date
100 Shampoo 2016-01-02
101 Book 2016-02-04
100 Conditioner 2015-12-31
101 Bookmark 2016-07-10
100 Cream 2016-02-12
101 Book2 2016-01-03
Run Code Online (Sandbox Code Playgroud)
然后,为每个客户获取一行:
select
clientID,
COLLECT_LIST(Product) as Prod_List,
sort_array(COLLECT_LIST(date)) as Date_Order
from tablename
group by 1;
Run Code Online (Sandbox Code Playgroud)
作为:
ClientID Prod_List Date_Order
100 ["Shampoo","Conditioner","Cream"] ["2015-12-31","2016-01-02","2016-02-12"]
101 ["Book","Bookmark","Book2"] ["2016-01-03","2016-02-04","2016-07-10"]
Run Code Online (Sandbox Code Playgroud)
但我想要的是将产品的顺序与正确的购买时间顺序联系起来。
可以只使用内置函数来完成它,但它不是一个漂亮的网站:-)
select clientid
,split(regexp_replace(concat_ws(',',sort_array(collect_list(concat_ws(':',cast(date as string),product)))),'[^:]*:([^,]*(,|$))','$1'),',') as prod_list
,sort_array(collect_list(date)) as date_order
from tablename
group by clientid
;
Run Code Online (Sandbox Code Playgroud)
+----------+-----------------------------------+------------------------------------------+
| clientid | prod_list | date_order |
+----------+-----------------------------------+------------------------------------------+
| 100 | ["Conditioner","Shampoo","Cream"] | ["2015-12-31","2016-01-02","2016-02-12"] |
| 101 | ["Book2","Book","Bookmark"] | ["2016-01-03","2016-02-04","2016-07-10"] |
+----------+-----------------------------------+------------------------------------------+
Run Code Online (Sandbox Code Playgroud)