使用猪拉丁选择计数明显

jda*_*mae 16 hadoop apache-pig

我需要这个猪脚本的帮助.我刚收到一条记录.我正在选择2列并在另一列上进行计数(不同),同时还使用where where子句来查找特定描述(desc).

这是我的猪我的sql我试图编码.

 /*
    For example in sql:
    select domain, count(distinct(segment)) as segment_cnt
    from table
    where desc='ABC123'
    group by domain
    order by segment_count desc;
    */

    A = LOAD 'myoutputfile' USING PigStorage('\u0005')
            AS (
                domain:chararray,
                segment:chararray,
                desc:chararray
                );
B = filter A by (desc=='ABC123');
C = foreach B generate domain, segment;
D = DISTINCT C;
E = group D all;
F = foreach E generate group, COUNT(D) as segment_cnt;
G = order F by segment_cnt DESC;
Run Code Online (Sandbox Code Playgroud)

Rom*_*ain 33

您可以在每个域上进行GROUP,然后使用嵌套的FOREACH语法计算每个组中不同元素的数量:

D = group C by domain;
E = foreach D { 
    unique_segments = DISTINCT C.segment;
    generate group, COUNT(unique_segments) as segment_cnt;
};
Run Code Online (Sandbox Code Playgroud)

  • 我认为要完美它应该是unique_segments = DISTINCT C.segment; (5认同)