job.setOutputKeyClass和job.setOutputReduceClass在哪里引用？

Question

job.setOutputKeyClass和job.setOutputReduceClass在哪里引用？

nik*_*686 16 java hadoop mapreduce

我认为他们指的是减速机,但在我的程序中我有

public static class MyMapper extends Mapper< LongWritable, Text, Text, Text >

和

public static class MyReducer extends Reducer< Text, Text, NullWritable, Text >

所以,如果我有

job.setOutputKeyClass( NullWritable.class );

job.setOutputValueClass( Text.class );

我得到以下例外

Type mismatch in key from map: expected org.apache.hadoop.io.NullWritable, recieved org.apache.hadoop.io.Text

但如果我有

job.setOutputKeyClass( Text.class );

没有问题.

我的代码是否有错误,或者这是因为NullWritable还是其他？

我也必须使用job.setInputFormatClass和job.setOutputFormatClass？因为没有它们我的程序运行正常.

Answer 1

Cha*_*guy 32

调用job.setOutputKeyClass( NullWritable.class );将设置期望的类型作为map和reduce阶段的输出.

如果Mapper发出的类型与Reducer不同,则可以使用JobConf's' setMapOutputKeyClass()和setMapOutputValueClass()方法设置映射器发出的类型.这些隐式设置Reducer期望的输入类型.

(来源:Yahoo Developer Tutorial)

关于你的第二个问题,默认InputFormat是TextInputFormat.这会将每个输入文件的每一行视为单独的记录,并且不执行解析.如果需要以不同的格式处理输入,可以调用这些方法,下面是一些示例:

InputFormat             | Description                                      | Key                                      | Value
--------------------------------------------------------------------------------------------------------------------------------------------------------
TextInputFormat         | Default format; reads lines of text files        | The byte offset of the line              | The line contents
KeyValueInputFormat     | Parses lines into key, val pairs                 | Everything up to the first tab character | The remainder of the line
SequenceFileInputFormat | A Hadoop-specific high-performance binary format | user-defined                             | user-defined

Run Code Online (Sandbox Code Playgroud)

默认实例OutputFormat是TextOutputFormat,它在文本文件的各行上写入(键,值)对.以下是一些例子:

OutputFormat             | Description
---------------------------------------------------------------------------------------------------------
TextOutputFormat         | Default; writes lines in "key \t value" form
SequenceFileOutputFormat | Writes binary files suitable for reading into subsequent MapReduce jobs
NullOutputFormat         | Disregards its inputs

Run Code Online (Sandbox Code Playgroud)

(来源:其他Yahoo开发人员教程)

归档时间：	12 年，11 月前
查看次数：	12962 次
最近记录：	12 年，11 月前