如何从一个HBase实例读取但是写入另一个?

sla*_*ton 8 hadoop hbase mapreduce

目前我有两个HBase的表(让我们称之为tableAtableB).使用单阶段MapReduce作业,tableA读取数据并将其保存到tableB.目前,两个表都位于同一HBase集群中.但是,我需要重新定位tableB到其群集上.

是否可以在Hadoop中配置单阶段映射reduce作业以从HBase的单独实例中读取和写入?

Rub*_*eda 4

HBase 的CopyTable MapReduce 作业有可能通过使用它来实现TableMapReduceUtil.initTableReducerJob()这一点,它允许您设置备用 quorumAddress,以防您需要写入远程集群:

public static void initTableReducerJob(String table, Class<? extends TableReducer> reducer, org.apache.hadoop.mapreduce.Job job, Class partitioner, String quorumAddress, String serverClass, String serverImpl)
Run Code Online (Sandbox Code Playgroud)

quorumAddress - 要写入的远程集群;对于 hbase-site.xml 中指定的集群的输出,默认值为 null。当您想让reduce写入非默认集群时,将此字符串设置为备用远程集群的zookeeper集合;例如,在集群之间复制表,源将由 hbase-site.xml 指定,并且此参数将具有远程集群的整体地址。传递的格式是特定的。传递::如server,server2,server3:2181:/hbase。


另一种选择是实现您自己的自定义减速器以写入远程表而不是写入上下文。与此类似的东西:

public static class MyReducer extends Reducer<Text, Result, Text, Text> {

    protected Table remoteTable; 
    protected Connection connection;

    @Override
    protected void setup(Context context) throws IOException, InterruptedException {
        super.setup(context);
        // Clone configuration and provide a new quorum address for the remote cluster
        Configuration config = HBaseConfiguration.create(context.getConfiguration());
        config.set("hbase.zookeeper.quorum","quorum1,quorum2,quorum3");
        connection = ConnectionFactory.createConnection(config); // HBase 0.99+
        //connection = HConnectionManager.createConnection(config); // HBase <0.99
        remoteTable = connection.getTable("myTable".getBytes());
        remoteTable.setAutoFlush(false);
        remoteTable.setWriteBufferSize(1024L*1024L*10L); // 10MB buffer
    }

    public void reduce(Text boardKey, Iterable<Result> results, Context context) throws IOException, InterruptedException {
        /* Write puts to remoteTable */
    }

    @Override
    protected void cleanup(Context context) throws IOException, InterruptedException {
        super.cleanup(context);
        if (remoteTable!=null) {
            remoteTable.flushCommits();
            remoteTable.close();
        }
        if(connection!=null) {
            connection.close();
        }
    }
}
Run Code Online (Sandbox Code Playgroud)