如何(在Hadoop中),数据放入映射并减少正确类型的函数?

Pra*_*eep 3 java hadoop mapreduce

我对Hadoop的理解有点困难,数据如何放入地图并减少功能.我知道我们可以定义输入格式和输出格式,然后定义输入和输出的键类型.但是对于一个例子,如果我们想要一个对象作为输入类型,Hadoop内部如何做到这一点?

感谢名单...

Tar*_*riq 7

您可以使用Hadoop InputFormat和OutputFormat接口来创建自定义格式.例如,可以将MapReduce作业的输出格式化为JSON ..像这样的东西 -

public class JsonOutputFormat extends TextOutputFormat<Text, IntWritable> {
    @Override
    public RecordWriter<Text, IntWritable> getRecordWriter(
            TaskAttemptContext context) throws IOException, 
                  InterruptedException {
        Configuration conf = context.getConfiguration();
        Path path = getOutputPath(context);
        FileSystem fs = path.getFileSystem(conf);
        FSDataOutputStream out = 
                fs.create(new Path(path,context.getJobName()));
        return new JsonRecordWriter(out);
    }

    private static class JsonRecordWriter extends 
          LineRecordWriter<Text,IntWritable>{
        boolean firstRecord = true;
        @Override
        public synchronized void close(TaskAttemptContext context)
                throws IOException {
            out.writeChar('{');
            super.close(null);
        }

        @Override
        public synchronized void write(Text key, IntWritable value)
                throws IOException {
            if (!firstRecord){
                out.writeChars(",\r\n");
                firstRecord = false;
            }
            out.writeChars("\"" + key.toString() + "\":\""+
                    value.toString()+"\"");
        }

        public JsonRecordWriter(DataOutputStream out) 
                throws IOException{
            super(out);
            out.writeChar('}');
        }
    }
}
Run Code Online (Sandbox Code Playgroud)