我有以下形式的数据:id,val1,val2
例
1,0.2,0.1
1,0.1,0.7
1,0.2,0.3
2,0.7,0.9
2,0.2,0.3
2,0.4,0.5
Run Code Online (Sandbox Code Playgroud)
所以首先我想按降序对每个id按val1排序.所以有些喜欢
1,0.2,0.1
1,0.2,0.3
1,0.1,0.7
2,0.7,0.9
2,0.4,0.5
2,0.2,0.3
Run Code Online (Sandbox Code Playgroud)
然后为每个id选择第二个元素id,val2组合,例如:
1,0.3
2,0.5
Run Code Online (Sandbox Code Playgroud)
我该如何处理?
谢谢
Pig是一种脚本语言,而不是像SQL这样的关系语言,它非常适合与嵌套在FOREACH中的运算符的组一起工作.这是解决方案:
A = LOAD 'input' USING PigStorage(',') AS (id:int, v1:float, v2:float);
B = GROUP A BY id; -- isolate all rows for the same id
C = FOREACH B { -- here comes the scripting bit
elems = ORDER A BY v1 DESC; -- sort rows belonging to the id
two = LIMIT elems 2; -- select top 2
two_invers = ORDER two BY v1 ASC; -- sort in opposite order to bubble second value to the top
second = LIMIT two_invers 1;
GENERATE FLATTEN(group) as id, FLATTEN(second.v2);
};
DUMP C;
Run Code Online (Sandbox Code Playgroud)
在您的示例中,id 1有两行,v1 == 0.2但不同的v2,因此id 1的第二个值可以是0.1或0.3
| 归档时间: |
|
| 查看次数: |
2331 次 |
| 最近记录: |