如何为自定义Hadoop类型定义ArrayWritable?我试图在Hadoop中实现倒排索引,使用自定义Hadoop类型来存储数据
我有一个Individual Posting类,它存储术语频率,文档ID和文档中术语的字节偏移列表.
我有一个Posting类,它有一个文档频率(术语出现的文档数)和个人发布列表
我已经定义扩展ArrayWritable类的字节偏移的列表中的一个LongArrayWritable IndividualPostings
当我为IndividualPosting定义一个自定义ArrayWritable时,我在本地部署后遇到了一些问题(使用Karmasphere,Eclipse).
Posting类列表中的所有IndividualPosting实例都是相同的,即使我在Reduce方法中得到不同的值
来自以下文件ArrayWritable:
可写入包含类实例的数组.这个可写的元素必须都是同一个类的实例.如果此可写对象是Reducer的输入,则需要创建一个子类,将该值设置为正确的类型.例如:
public class IntArrayWritable extends ArrayWritable { public IntArrayWritable() { super(IntWritable.class); } }
您已经引用WritableComparable了Hadoop定义的类型.这是我假设你的实现看起来像LongWritable:
public static class LongArrayWritable extends ArrayWritable
{
public LongArrayWritable() {
super(LongWritable.class);
}
public LongArrayWritable(LongWritable[] values) {
super(LongWritable.class, values);
}
}
Run Code Online (Sandbox Code Playgroud)
您应该可以使用任何实现的类型执行此操作WritableComparable,如文档所示.使用他们的例子:
public class MyWritableComparable implements
WritableComparable<MyWritableComparable> {
// Some data
private int counter;
private long timestamp;
public void write(DataOutput out) throws IOException {
out.writeInt(counter);
out.writeLong(timestamp);
}
public void readFields(DataInput in) throws IOException {
counter = in.readInt();
timestamp = in.readLong();
}
public int compareTo(MyWritableComparable other) {
int thisValue = this.counter;
int thatValue = other.counter;
return (thisValue < thatValue ? -1 : (thisValue == thatValue ? 0 : 1));
}
}
Run Code Online (Sandbox Code Playgroud)
这应该是那样的.这假设您正在使用修订版0.20.2或0.21.0Hadoop API.
| 归档时间: |
|
| 查看次数: |
9726 次 |
| 最近记录: |