sort_array 按不同的列排序,Hive

IDK*_*IDK 4 sorting hadoop hive

我有两列,一列是产品,另一列是购买日期。我可以通过应用 sort_array(dates) 函数来对日期进行排序,但我希望能够按购买日期对 sort_array(products) 进行排序。有没有办法在 Hive 中做到这一点?

表名是

ClientID    Product    Date
100    Shampoo    2016-01-02
101    Book    2016-02-04
100    Conditioner    2015-12-31
101    Bookmark    2016-07-10
100    Cream    2016-02-12
101    Book2    2016-01-03
Run Code Online (Sandbox Code Playgroud)

然后,为每个客户获取一行:

select
clientID,
COLLECT_LIST(Product) as Prod_List,
sort_array(COLLECT_LIST(date)) as Date_Order
from tablename
group by 1;
Run Code Online (Sandbox Code Playgroud)

作为:

ClientID    Prod_List    Date_Order
100    ["Shampoo","Conditioner","Cream"]    ["2015-12-31","2016-01-02","2016-02-12"]
101    ["Book","Bookmark","Book2"]    ["2016-01-03","2016-02-04","2016-07-10"]
Run Code Online (Sandbox Code Playgroud)

但我想要的是将产品的顺序与正确的购买时间顺序联系起来。

Dav*_*itz 5

可以只使用内置函数来完成它,但它不是一个漂亮的网站:-)

select      clientid
           ,split(regexp_replace(concat_ws(',',sort_array(collect_list(concat_ws(':',cast(date as string),product)))),'[^:]*:([^,]*(,|$))','$1'),',') as prod_list
           ,sort_array(collect_list(date)) as date_order

from        tablename 

group by    clientid
; 
Run Code Online (Sandbox Code Playgroud)
+----------+-----------------------------------+------------------------------------------+
| clientid |             prod_list             |                date_order                |
+----------+-----------------------------------+------------------------------------------+
|      100 | ["Conditioner","Shampoo","Cream"] | ["2015-12-31","2016-01-02","2016-02-12"] |
|      101 | ["Book2","Book","Bookmark"]       | ["2016-01-03","2016-02-04","2016-07-10"] |
+----------+-----------------------------------+------------------------------------------+
Run Code Online (Sandbox Code Playgroud)