小编Que*_*ner的帖子

在 AWS EMR 上将 Hadoop+Hive 与 MongoDB 连接(未找到类 com/mongodb/DBObject)

我喜欢通过 MongoDB 连接(而不是通过 BSON 转储)将 EMR 集群连接到我们的 MongoDB。

为此,我通过 AWS 管理控制台生成了集群。在 Bootstrap 配置中,我指向了放置在 S3 上的这个文件:

#!/bin/sh

wget -P /home/hadoop/lib http://central.maven.org/maven2/org/mongodb/mongo-java-driver/2.13.0/mongo-java-driver-2.13.0.jar

wget -P /home/hadoop/lib https://github.com/mongodb/mongo-hadoop/releases/download/r1.3.2/mongo-hadoop-core-1.3.2.jar
wget -P /home/hadoop/lib https://github.com/mongodb/mongo-hadoop/releases/download/r1.3.2/mongo-hadoop-pig-1.3.2.jar
wget -P /home/hadoop/lib https://github.com/mongodb/mongo-hadoop/releases/download/r1.3.2/mongo-hadoop-hive-1.3.2.jar
Run Code Online (Sandbox Code Playgroud)

当集群产生时,我进入主节点并看到它们已成功下载。

当我在 Hive shell 中执行此操作时:

CREATE TABLE nicks
( 
  id STRING,
  name STRING,
  business STRING,
  alias STRING
)
STORED BY 'com.mongodb.hadoop.hive.MongoStorageHandler'
TBLPROPERTIES('mongo.uri'='mongodb://54.93.123.123:27017/foo.aliases');

ADD JAR /home/hadoop/lib/mongo-hadoop-core-1.3.2.jar;
ADD JAR /home/hadoop/lib/mongo-hadoop-hive-1.3.2.jar;
ADD JAR /home/hadoop/lib/mongo-java-driver-2.13.0.jar;

Select * from nicks;
Run Code Online (Sandbox Code Playgroud)

我得到这个例外:

Exception in thread "main" java.lang.NoClassDefFoundError: com/mongodb/DBObject
    at com.mongodb.hadoop.splitter.MongoSplitterFactory.getSplitterByClass(MongoSplitterFactory.java:41)
    at com.mongodb.hadoop.splitter.MongoSplitterFactory.getSplitter(MongoSplitterFactory.java:109)
    at com.mongodb.hadoop.hive.input.HiveMongoInputFormat.getSplits(HiveMongoInputFormat.java:64) …
Run Code Online (Sandbox Code Playgroud)

hadoop hive amazon-web-services mongodb-java emr

4
推荐指数
1
解决办法
1058
查看次数

标签 统计

amazon-web-services ×1

emr ×1

hadoop ×1

hive ×1

mongodb-java ×1