在HBase中存储和检索字符串数组

fya*_*yaa 3 serialization hbase

关于用HBase存储字符串数组,我已经读过这个答案(如何将复杂对象存储到hadoop Hbase?).

据说它使用ArrayWritableClass来序列化数组.随着WritableUtils.toByteArray(Writable ... writable)我将得到一个byte[]我可以存储在HBase中.

当我现在尝试再次检索行时,我得到一个byte[]我以某种方式再次转换为一个ArrayWritable.但我找不到办法做到这一点.也许你知道一个答案,或者我在做根本错误的序列化我的错误String[]

Lor*_*dig 5

您可以应用以下方法来取回ArrayWritable(取自我之前的答案,请参阅此处).

public static <T extends Writable> T asWritable(byte[] bytes, Class<T> clazz)
            throws IOException {
        T result = null;
        DataInputStream dataIn = null;
        try {
            result = clazz.newInstance();
            ByteArrayInputStream in = new ByteArrayInputStream(bytes);
            dataIn = new DataInputStream(in);
            result.readFields(dataIn);
        }
        catch (InstantiationException e) {
            // should not happen
            assert false;
        }
        catch (IllegalAccessException e) {
            // should not happen
            assert false;
        }
        finally {
            IOUtils.closeQuietly(dataIn);
        }
        return result;
    }
Run Code Online (Sandbox Code Playgroud)

此方法仅根据提供的类类型标记将字节数组反序列化为正确的对象类型.
例如:假设您有一个自定义的ArrayWritable:

public class TextArrayWritable extends ArrayWritable {
    public TextArrayWritable() {
      super(Text.class);
    }
}
Run Code Online (Sandbox Code Playgroud)

现在您发出一个HBase get:

...
Get get = new Get(row);
Result result = htable.get(get);
byte[] value = result.getValue(family, qualifier);
TextArrayWritable tawReturned = asWritable(value, TextArrayWritable.class);
Text[] texts = (Text[]) tawReturned.toArray();
for (Text t : texts) {
  System.out.print(t + " ");
}
...
Run Code Online (Sandbox Code Playgroud)

注意:
您可能已经在WritableUtils中找到了readCompressedStringArray()writeCompressedStringArray()方法,如果您有自己的String数组支持的Writable类,这些方法似乎是合适的.在使用它们之前,我会警告您,由于gzip压缩/解压缩导致的开销,这些可能会导致严重的性能损失.