我可以使用Pig Latin中的嵌套FOREACH语句生成嵌套包吗?

PP.*_*PP. 7 apache-pig

假设我有一个餐厅评论数据集:

User,City,Restaurant,Rating
Jim,New York,Mecurials,3
Jim,New York,Whapme,4.5
Jim,London,Pint Size,2
Lisa,London,Pint Size,4
Lisa,London,Rabbit Whole,3.5
Run Code Online (Sandbox Code Playgroud)

我想根据用户和城市的平均评论生成一个列表.即输出:

User,City,AverageRating
Jim,New York,3.75
Jim,London,2
Lisa,London,3.75
Run Code Online (Sandbox Code Playgroud)

我可以编写一个Pig脚本,如下所示:

Data = LOAD 'data.txt' USING PigStorage(',') AS (
    user:chararray, city:chararray, restaurant:charray, rating:float
);

PerUserCity = GROUP Data BY (user, city);

ResultSet = FOREACH PerUserCity {
    GENERATE group.user, group.city, AVG(Data.rating);
}
Run Code Online (Sandbox Code Playgroud)

但是我很好奇我是否可以先对更高级别的组(用户)进行分组,然后再对下一级(城市)进行分组:即

PerUser = GROUP Data BY user;

Intermediate = FOREACH PerUser {
    B = GROUP Data BY city;
    GENERATE group AS user, B;
}
Run Code Online (Sandbox Code Playgroud)

我明白了:

Error during parsing.
Invalid alias: GROUP in {
  group: chararray,
  Data: {
    user: chararray,
    city: chararray,
    restaurant: chararray,
    rating: float
  }
}
Run Code Online (Sandbox Code Playgroud)

有人试过这个成功吗?是否根本不可能在FOREACH中进行GROUP?

我的目标是做一些事情:

ResultSet = FOREACH PerUser {
    FOREACH City {
        GENERATE user, city, AVG(City.rating)
    }
}
Run Code Online (Sandbox Code Playgroud)

Rom*_*ain 8

目前允许的操作是DISTINCT,FILTER,LIMIT,和ORDER BY一个里面FOREACH.

现在直接按(用户,城市)分组是你说的好方法.