jda*_*mae 16 hadoop apache-pig
我需要这个猪脚本的帮助.我刚收到一条记录.我正在选择2列并在另一列上进行计数(不同),同时还使用where where子句来查找特定描述(desc).
这是我的猪我的sql我试图编码.
/*
For example in sql:
select domain, count(distinct(segment)) as segment_cnt
from table
where desc='ABC123'
group by domain
order by segment_count desc;
*/
A = LOAD 'myoutputfile' USING PigStorage('\u0005')
AS (
domain:chararray,
segment:chararray,
desc:chararray
);
B = filter A by (desc=='ABC123');
C = foreach B generate domain, segment;
D = DISTINCT C;
E = group D all;
F = foreach E generate group, COUNT(D) as segment_cnt;
G = order F by segment_cnt DESC;
Run Code Online (Sandbox Code Playgroud)
Rom*_*ain 33
您可以在每个域上进行GROUP,然后使用嵌套的FOREACH语法计算每个组中不同元素的数量:
D = group C by domain;
E = foreach D {
unique_segments = DISTINCT C.segment;
generate group, COUNT(unique_segments) as segment_cnt;
};
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
28194 次 |
| 最近记录: |