我正在尝试通过更改hadoop给出的wordcount示例来创建一个简单的map reduce工作.
我正在尝试列出一个列表而不是一个单词的计数.wordcount示例给出以下输出
hello 2
world 2
Run Code Online (Sandbox Code Playgroud)
我试图将其作为列表输出,这将构成未来工作的基础
hello 1 1
world 1 1
Run Code Online (Sandbox Code Playgroud)
我认为我走在正确的轨道上,但我在编写清单时遇到了麻烦.而不是上述,我得到了
Hello foo.MyArrayWritable@61250ff2
World foo.MyArrayWritable@483a0ab1
Run Code Online (Sandbox Code Playgroud)
这是我的MyArrayWritable.我把一个sys放在了,write(DataOuptut arg0)
但它从来没有输出任何东西所以我认为这个方法可能不会被调用,我不知道为什么.
class MyArrayWritable extends ArrayWritable{
public MyArrayWritable(Class<? extends Writable> valueClass, Writable[] values) {
super(valueClass, values);
}
public MyArrayWritable(Class<? extends Writable> valueClass) {
super(valueClass);
}
@Override
public IntWritable[] get() {
return (IntWritable[]) super.get();
}
@Override
public void write(DataOutput arg0) throws IOException {
for(IntWritable i : get()){
i.write(arg0);
}
}
}
Run Code Online (Sandbox Code Playgroud)
编辑 - 添加更多源代码
public class WordCount {
public static …
Run Code Online (Sandbox Code Playgroud) 我正在使用ArrayWritable
,在某些时候我需要检查Hadoop如何序列化ArrayWritable
,这是我通过设置得到的job.setNumReduceTasks(0)
:
0 IntArrayWritable@10f11b8
3 IntArrayWritable@544ec1
6 IntArrayWritable@fe748f
8 IntArrayWritable@1968e23
11 IntArrayWritable@14da8f4
14 IntArrayWritable@18f6235
Run Code Online (Sandbox Code Playgroud)
这是我使用的测试映射器:
public static class MyMapper extends Mapper<LongWritable, Text, LongWritable, IntArrayWritable> {
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
int red = Integer.parseInt(value.toString());
IntWritable[] a = new IntWritable[100];
for (int i =0;i<a.length;i++){
a[i] = new IntWritable(red+i);
}
IntArrayWritable aw = new IntArrayWritable();
aw.set(a);
context.write(key, aw);
}
}
Run Code Online (Sandbox Code Playgroud)
IntArrayWritable
取自javadoc:ArrayWritable中给出的示例.
import org.apache.hadoop.io.ArrayWritable;
import org.apache.hadoop.io.IntWritable;
public class …
Run Code Online (Sandbox Code Playgroud)