在impala中ROW_NUMBER()OVER

use*_*851 6 sql window-functions impala

我有一个用例,我需要在PARTITION上使用ROW_NUMBER():类似于:

SELECT
  Column1 , Column 2
  ROW_NUMBER() OVER (
    PARTITION BY ACCOUNT_NUM
    ORDER BY FREQ, MAN, MODEL) as LEVEL
FROM
  TEST_TABLE
Run Code Online (Sandbox Code Playgroud)

我需要在Impala中解决这个问题.不幸的是,Impala不支持子查询,也不支持ROW_NUMBER()OVER功能.谢谢您的帮助.

Tag*_*gar 6

在CDH 5.2中添加了ROW_NUMBER()OVER PARTITION:

https://www.cloudera.com/documentation/enterprise/latest/topics/impala_analytic_functions.html#row_number

ROW_NUMBER() OVER([partition_by_clause] order_by_clause)
Run Code Online (Sandbox Code Playgroud)


Gor*_*off 4

Impala 对于这种类型的查询相当有限。有了一些假设,这个查询是可能的:

  • 分区子句中的四列永远不会NULL
  • 分区子句中的四列唯一标识一行

该查询相当丑陋且昂贵:

select tt.column1, tt.column2, count(*) as level
from test_table tt join
     test_table tt2
     on tt.account_num = tt2.account_num and
        (tt2.freq < tt.freq or
         tt2.freq = tt.freq and tt2.man < t.man or
         tt2.freq = tt.freq and tt2.man = t.man and tt2.model <= t.model
        )
group by tt.column1, tt.column2, tt.account_num, tt.freq, tt.man, tt.model;
Run Code Online (Sandbox Code Playgroud)