igo*_*gon 5 java hadoop mapreduce
我正在使用ArrayWritable,在某些时候我需要检查Hadoop如何序列化ArrayWritable,这是我通过设置得到的job.setNumReduceTasks(0):
0 IntArrayWritable@10f11b8
3 IntArrayWritable@544ec1
6 IntArrayWritable@fe748f
8 IntArrayWritable@1968e23
11 IntArrayWritable@14da8f4
14 IntArrayWritable@18f6235
Run Code Online (Sandbox Code Playgroud)
这是我使用的测试映射器:
public static class MyMapper extends Mapper<LongWritable, Text, LongWritable, IntArrayWritable> {
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
int red = Integer.parseInt(value.toString());
IntWritable[] a = new IntWritable[100];
for (int i =0;i<a.length;i++){
a[i] = new IntWritable(red+i);
}
IntArrayWritable aw = new IntArrayWritable();
aw.set(a);
context.write(key, aw);
}
}
Run Code Online (Sandbox Code Playgroud)
IntArrayWritable取自javadoc:ArrayWritable中给出的示例.
import org.apache.hadoop.io.ArrayWritable;
import org.apache.hadoop.io.IntWritable;
public class IntArrayWritable extends ArrayWritable {
public IntArrayWritable() {
super(IntWritable.class);
}
}
Run Code Online (Sandbox Code Playgroud)
我实际上检查了Hadoop的源代码,这对我没有意义.
ArrayWritable不应该序列化类名,并且IntWritable不能使用6/7十六进制值序列化100的数组.应用程序实际上似乎工作正常,reducer反序列化正确的值...发生了什么?我错过了什么?
您必须覆盖默认toString()方法.
它被称为TextOutputFormat创建一个人类可读的格式.
尝试以下代码并查看结果:
public class IntArrayWritable extends ArrayWritable {
public IntArrayWritable() {
super(IntWritable.class);
}
@Override
public String toString() {
StringBuilder sb = new StringBuilder();
for (String s : super.toStrings())
{
sb.append(s).append(" ");
}
return sb.toString();
}
}
Run Code Online (Sandbox Code Playgroud)
问题是您从 MapReduce 作业获得的输出不是该数据的序列化版本。它被翻译成漂亮的打印字符串。
当您将化简器的数量设置为零时,您的映射器现在将通过输出格式传递,该输出格式将格式化您的数据,可能将其转换为可读字符串。它不会将其序列化转储出来,就好像它将被减速器拾取一样。
| 归档时间: |
|
| 查看次数: |
6660 次 |
| 最近记录: |