相关疑难解决方法(0)

如何在PIG拉丁语中优化分组声明?

我有一个偏斜的数据集,我需要按操作进行分组,然后对它进行嵌套的foreach.由于数据偏差,很少有减速机需要很长时间,而其他减速机则没有时间.我知道存在偏差连接但是对于分组和foreach有什么用?这是我的猪代码(重命名变量):

foo_grouped = GROUP foo_grouped by FOO;
FOO_stats = FOREACH foo_grouped 
{ 
a_FOO_total = foo_grouped.ATTR; 
a_FOO_total = DISTINCT a_FOO_total; 

bar_count = foo_grouped.BAR; 
bar_count = DISTINCT bar_count; 

a_FOO_type1 = FILTER foo_grouped by COND1=='Y';
a_FOO_type1 = a_FOO_type1.ATTR; 
a_FOO_type1 = DISTINCT a_FOO_type1;

a_FOO_type2 = FILTER foo_grouped by COND2=='Y' OR COND3=='HIGH'; 
a_FOO_type2 = a_FOO_type2.ATTR; 
a_FOO_type2 = DISTINCT a_FOO_type2; 

generate group as FOO, 
COUNT(a_FOO_total) as a_FOO_total, COUNT(a_FOO_type1) as a_FOO_type1, COUNT(a_FOO_type2)     as a_FOO_type2, COUNT(bar_count) as bar_count; }
Run Code Online (Sandbox Code Playgroud)

apache-pig

4
推荐指数
1
解决办法
4403
查看次数

标签 统计

apache-pig ×1