pig - 如何在JOIN之后引用FOREACH中的列?

iha*_*nny 20 apache-pig

A = load 'a.txt' as (id, a1);
B = load 'b.txt as (id, b1);
C = join A by id, B by id;
D = foreach C generate id,a1,b1;
dump D;
Run Code Online (Sandbox Code Playgroud)

第4行失败: Invalid field projection. Projected field [id] does not exist in schema

我试图改为A.id,但最后一行失败了: ERROR 0: Scalar has more than one row in the output.

Don*_*ner 47

您正在寻找的是"消歧操作员".你想要的A::id不是A.id.

A.id说"有一个关系/包, A并且id在它的模式中有一个列"

A::id说"有一个记录A和有一个名为列id"

所以,你会这样做:

A = load 'a.txt' as (id, a1);
B = load 'b.txt as (id, b1);
C = join A by id, B by id;
D = foreach C generate A::id,a1,b1;
dump D;
Run Code Online (Sandbox Code Playgroud)

肮脏的选择:

仅仅因为我很懒,当你开始一个接一个地进行多个连接时,消歧变得非常奇怪:使用唯一标识符.

A = load 'a.txt' as (ida, a1);
B = load 'b.txt as (idb, b1);
C = join A by ida, B by idb;
D = foreach C generate ida,a1,b1;
dump D;
Run Code Online (Sandbox Code Playgroud)