为自定义Hadoop类型实现ArrayWritable

Aks*_*hay 4 hadoop mapreduce

如何为自定义Hadoop类型定义ArrayWritable?我试图在Hadoop中实现倒排索引,使用自定义Hadoop类型来存储数据

我有一个Individual Posting类,它存储术语频率,文档ID和文档中术语的字节偏移列表.

我有一个Posting类,它有一个文档频率(术语出现的文档数)和个人发布列表

我已经定义扩展ArrayWritable类的字节偏移的列表中的一个LongArrayWritable IndividualPostings

当我为IndividualPosting定义一个自定义ArrayWritable时,我在本地部署后遇到了一些问题(使用Karmasphere,Eclipse).

Posting类列表中的所有IndividualPosting实例都是相同的,即使我在Reduce方法中得到不同的值

MrG*_*mez 9

来自以下文件ArrayWritable:

可写入包含类实例的数组.这个可写的元素必须都是同一个类的实例.如果此可写对象是Reducer的输入,则需要创建一个子类,将该值设置为正确的类型.例如: public class IntArrayWritable extends ArrayWritable { public IntArrayWritable() { super(IntWritable.class); } }

您已经引用WritableComparable了Hadoop定义的类型.这是我假设你的实现看起来像LongWritable:

public static class LongArrayWritable extends ArrayWritable
{
    public LongArrayWritable() {
        super(LongWritable.class);
    }
    public LongArrayWritable(LongWritable[] values) {
        super(LongWritable.class, values);
    }
}
Run Code Online (Sandbox Code Playgroud)

您应该可以使用任何实现的类型执行此操作WritableComparable,如文档所示.使用他们的例子:

public class MyWritableComparable implements
        WritableComparable<MyWritableComparable> {

    // Some data
    private int counter;
    private long timestamp;

    public void write(DataOutput out) throws IOException {
        out.writeInt(counter);
        out.writeLong(timestamp);
    }

    public void readFields(DataInput in) throws IOException {
        counter = in.readInt();
        timestamp = in.readLong();
    }

    public int compareTo(MyWritableComparable other) {
        int thisValue = this.counter;
        int thatValue = other.counter;
        return (thisValue < thatValue ? -1 : (thisValue == thatValue ? 0 : 1));
    }
}
Run Code Online (Sandbox Code Playgroud)

这应该是那样的.这假设您正在使用修订版0.20.20.21.0Hadoop API.