在Amazon EMR上使用java中的hbase时出现问题

fra*_*zzi 5 hadoop hbase amazon-web-services elastic-map-reduce apache-zookeeper

所以我试图使用自定义jar在Amazon ec2上查询我的hbase集群,我将其作为MapReduce步骤启动.我的jar(在map函数内)我将Hbase称为:

public void map( Text key, BytesWritable value, Context contex ) throws IOException, InterruptedException {
    Configuration conf = HBaseConfiguration.create();
    HTable table = new HTable(conf, "tablename");
      ...
Run Code Online (Sandbox Code Playgroud)

问题是,当它到达HTable线并尝试连接到hbase时,步骤失败,我得到以下错误:

2014-02-28 18:00:49,936 INFO [main] org.apache.zookeeper.ZooKeeper: Initiating client connection, connectString=localhost:2181 sessionTimeout=180000 watcher=hconnection
2014-02-28 18:00:49,974 INFO [main] org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: The identifier of this process is 5119@ip-10-0-35-130.ec2.internal
2014-02-28 18:00:49,998 INFO [main-SendThread(localhost:2181)] org.apache.zookeeper.ClientCnxn: Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error)
2014-02-28 18:00:50,005 WARN [main-SendThread(localhost:2181)] org.apache.zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused

      ...

2014-02-28 18:01:05,542 WARN [main] org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper exception: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/hbaseid
2014-02-28 18:01:05,542 ERROR [main] org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: ZooKeeper exists failed after 3 retries
2014-02-28 18:01:05,542 WARN [main] org.apache.hadoop.hbase.zookeeper.ZKUtil: hconnection Unable to set watcher on znode (/hbase/hbaseid)
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/hbaseid

      ... and on and on
Run Code Online (Sandbox Code Playgroud)

我可以很好地使用hbase shell,并且可以从shell查询数据和所有内容.我不知道从哪里开始,我一直在谷歌搜索几个小时没有运气.在互联网上这样的大多数问题都没有谈到亚马逊特定的修复.我认为zookeeper和hbase应该通过亚马逊引导程序自动连接.

我使用hbase 0.94.17 jar和亚马逊运行hbase 0.94.7我很确定这不是问题,我猜它更多我没有正确设置Java代码.如果有人可以提供帮助,那就非常感谢.谢谢

fra*_*zzi 9

好吧,经过近30个小时的尝试,我找到了解决方案.有很多警告,版本很重要.

在这种情况下,我使用amazon emr hadoop2(ami 3.0.4)和Hbase 0.94.7并且我试图在同一个集群上运行自定义jar以通过java本地访问hbase.

因此,第一件事是默认的hbase配置不起作用,因为EC2面临的外部/内部IP idiosynchronicies.所以你不能使用HConfiguration(因为它默认为localhost仲裁)你要做的就是使用amazon为你设置的配置(位于/home/hadoop/hbase/conf/hbase-site.xml)和只需手动将其添加到空白配置对象.

连接代码如下所示:

Configuration conf = new Configuration();
conf.addResource("/home/hadoop/hbase/conf/hbase-site.xml");
HBaseAdmin.checkHBaseAvailable(conf);
Run Code Online (Sandbox Code Playgroud)

其次,你必须在你的自定义jar中使用正确的hbase jar PACKAGED.原因是因为默认情况下为hadoop1编译了hbase 94.x,所以你必须抓住一个名为hbase-0.94.6-cdh4.3.0.jar的cloudera hbase jar(你可以在网上找到),它已经针对hadoop2进行了编译.如果你不做这部分,你会得到许多令人讨厌的,不可谷歌的错误,包括org.apache.hadoop.net.NetUtils异常.