我可以在猪的多个栏目上做出明显的分析吗?

Pra*_*wal 4 apache-pig

我有一个用例,我需要计算两个字段的不同数量.

样品:

x = LOAD 'testdata' using PigStorage('^A') as (a,b,c,d);

y = GROUP x BY a;

z = FOREACH y {

        **bc = DISTINCT x.b,x.c;**
        dd = DISTINCT x.d;
        GENERATE FLATTEN(group) as (a), COUNT(bc), COUNT(dd);
};
Run Code Online (Sandbox Code Playgroud)

reo*_*toa 9

你很亲密.关键是不应用于DISTINCT两个字段,而是将其应用于您创建的单个复合字段:

x = LOAD 'testdata' using PigStorage('^A') as (a,b,c,d);
x2 = FOREACH x GENERATE a, TOTUPLE(b,c) AS bc, d
y = GROUP x2 BY a;
z = FOREACH y {
        bc = DISTINCT x2.bc;
        dd = DISTINCT x2.d;
        GENERATE FLATTEN(group) AS (a), COUNT(bc), COUNT(dd);
};
Run Code Online (Sandbox Code Playgroud)