我喜欢通过 MongoDB 连接(而不是通过 BSON 转储)将 EMR 集群连接到我们的 MongoDB。
为此,我通过 AWS 管理控制台生成了集群。在 Bootstrap 配置中,我指向了放置在 S3 上的这个文件:
#!/bin/sh
wget -P /home/hadoop/lib http://central.maven.org/maven2/org/mongodb/mongo-java-driver/2.13.0/mongo-java-driver-2.13.0.jar
wget -P /home/hadoop/lib https://github.com/mongodb/mongo-hadoop/releases/download/r1.3.2/mongo-hadoop-core-1.3.2.jar
wget -P /home/hadoop/lib https://github.com/mongodb/mongo-hadoop/releases/download/r1.3.2/mongo-hadoop-pig-1.3.2.jar
wget -P /home/hadoop/lib https://github.com/mongodb/mongo-hadoop/releases/download/r1.3.2/mongo-hadoop-hive-1.3.2.jar
Run Code Online (Sandbox Code Playgroud)
当集群产生时,我进入主节点并看到它们已成功下载。
当我在 Hive shell 中执行此操作时:
CREATE TABLE nicks
(
id STRING,
name STRING,
business STRING,
alias STRING
)
STORED BY 'com.mongodb.hadoop.hive.MongoStorageHandler'
TBLPROPERTIES('mongo.uri'='mongodb://54.93.123.123:27017/foo.aliases');
ADD JAR /home/hadoop/lib/mongo-hadoop-core-1.3.2.jar;
ADD JAR /home/hadoop/lib/mongo-hadoop-hive-1.3.2.jar;
ADD JAR /home/hadoop/lib/mongo-java-driver-2.13.0.jar;
Select * from nicks;
Run Code Online (Sandbox Code Playgroud)
我得到这个例外:
Exception in thread "main" java.lang.NoClassDefFoundError: com/mongodb/DBObject
at com.mongodb.hadoop.splitter.MongoSplitterFactory.getSplitterByClass(MongoSplitterFactory.java:41)
at com.mongodb.hadoop.splitter.MongoSplitterFactory.getSplitter(MongoSplitterFactory.java:109)
at com.mongodb.hadoop.hive.input.HiveMongoInputFormat.getSplits(HiveMongoInputFormat.java:64) …Run Code Online (Sandbox Code Playgroud)