在Hadoop中的多个文件中编写输出

Question

在Hadoop中的多个文件中编写输出

Bha*_*hah 4 hadoop mapreduce

可能重复:
hadoop中的MultipleOutputFormat

我想根据几个月使用hadoop中的map-reduce来编写文件.如果数据来自1月份,那么数据应该在jan-file中进行,同样每个月应该有单独的文件.

如何在hadoop mapredude中创建此类文件.我正在尝试递归map-reduce但没有得到如何实现它？

请给我一些解决方案.

谢谢.

Answer 1

Pra*_*ati 5

使用MultipleOutputFormat类,输出文件名可以从键和reducer的reducer输出值中推导出来.MultipleOutputFormat#generateFileNameForKeyValue必须在用户定义的OutputFormat类中实现.

static class MyMultipleOutputFormat extends MultipleOutputFormat<Text, Text> {
    protected String generateFileNameForKeyValue(Text key, Text value, String name) {
        String keyString = key.toString();
        String valueString = value.toString();
        #return a combination of keyString and valueString 
    }
}

Run Code Online (Sandbox Code Playgroud)

归档时间：	14 年，1 月前
查看次数：	6172 次
最近记录：	13 年，11 月前