Ved*_*ant 5 mysql hadoop apache-pig
试图在Pig上完成这项工作.(寻找与MySQL相当的group_concat())
例如,在我的表中,我有:(3fields- userid,clickcount,pagenumber)
155 | 2 | 12
155 | 3 | 133
155 | 1 | 144
156 | 6 | 1
156 | 7 | 5
Run Code Online (Sandbox Code Playgroud)
所需的输出是:
155| 2,3,1 | 12,133,144
156| 6,7 | 1,5
Run Code Online (Sandbox Code Playgroud)
我怎样才能在PIG上实现这一目标?
grouped = GROUP table BY userid;
X = FOREACH grouped GENERATE group as userid,
table.clickcount as clicksbag,
table.pagenumber as pagenumberbag;
Run Code Online (Sandbox Code Playgroud)
现在X将是:
{(155,{(2),(3),(1)},{(12),(133),(144)},
(156,{(6),(7)},{(1),(5)}}
Run Code Online (Sandbox Code Playgroud)
现在你需要使用内置的UDF BagToTuple:
output = FOREACH X GENERATE userid,
BagToTuple(clickbag) as clickcounts,
BagToTuple(pagenumberbag) as pagenumbers;
Run Code Online (Sandbox Code Playgroud)
output现在应该包含你想要的东西.您也可以将输出步骤合并到合并步骤中:
output = FOREACH grouped GENERATE group as userid,
BagToTuple(table.clickcount) as clickcounts,
BagToTuple(table.pagenumber) as pagenumbers;
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
1291 次 |
| 最近记录: |