Phi*_*ien 5 java hadoop mongodb maven apache-spark
类似于我的问题,但这一次是Java,而不是Python,给我带来了问题.
我已经遵循建议的步骤(据我所知),但由于我使用的是hadoop-2.6.1,我认为我应该使用旧的API,而不是示例中提到的新API.
我正在研究Ubuntu和我拥有的各种组件版本
我的Java程序是基本的
import org.apache.spark.api.java.*;
import org.apache.spark.SparkConf;
import org.apache.spark.api.java.function.Function;
import com.mongodb.hadoop.MongoInputFormat;
import org.apache.hadoop.conf.Configuration;
import org.bson.BSONObject;
public class SimpleApp {
public static void main(String[] args) {
Configuration mongodbConfig = new Configuration();
mongodbConfig.set("mongo.job.input.format", "com.mongodb.hadoop.MongoInputFormat");
mongodbConfig.set("mongo.input.uri", "mongodb://localhost:27017/db.collection");
SparkConf conf = new SparkConf().setAppName("Simple Application");
JavaSparkContext sc = new JavaSparkContext(conf);
JavaPairRDD<Object, BSONObject> documents = sc.newAPIHadoopRDD(
mongodbConfig, // Configuration
MongoInputFormat.class, // InputFormat: read from a live cluster.
Object.class, // Key class
BSONObject.class // Value class
);
}
}
Run Code Online (Sandbox Code Playgroud)
使用Maven(mvn package)和以下pom文件构建正常
<project>
<groupId>edu.berkeley</groupId>
<artifactId>simple-project</artifactId>
<modelVersion>4.0.0</modelVersion>
<name>Simple Project</name>
<packaging>jar</packaging>
<version>1.0</version>
<dependencies>
<dependency> <!-- Spark dependency -->
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.10</artifactId>
<version>1.5.1</version>
</dependency>
<dependency>
<groupId>org.mongodb</groupId>
<artifactId>mongo-java-driver</artifactId>
<version>3.2.0</version>
</dependency>
<dependency>
<groupId>org.mongodb.mongo-hadoop</groupId>
<artifactId>mongo-hadoop-core</artifactId>
<version>1.4.2</version>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<configuration>
<source>1.8</source>
<target>1.8</target>
</configuration>
</plugin>
</plugins>
</build>
</project>
Run Code Online (Sandbox Code Playgroud)
然后我提交罐子
/usr/local/share/spark-1.5.1-bin-hadoop2.6/bin/spark-submit --class "SimpleApp" --master local[4] target/simple-project-1.0.jar
Run Code Online (Sandbox Code Playgroud)
并得到以下错误
Exception in thread "main" java.lang.NoClassDefFoundError: com/mongodb/hadoop/MongoInputFormat
at SimpleApp.main(SimpleApp.java:18)
Run Code Online (Sandbox Code Playgroud)
我在12月18日编辑了这个问题,因为它变得过于混乱和冗长.以前的评论可能看起来无关紧要 然而,问题的背景是相同的.
我遇到了同样的问题,但经过大量的试验和更改,我用这段代码完成了我的工作。我正在 ubuntu 和 Java 7 上使用 netbeans 运行 Maven 项目希望这会有所帮助。
包括maven-shade-plugin是否存在任何黑白类冲突
PS:我不知道你的具体错误,但遇到了很多。并且这段代码运行完美。
<dependencies>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.11</artifactId>
<version>1.5.1</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.11</artifactId>
<version>1.5.1</version>
</dependency>
<dependency>
<groupId>log4j</groupId>
<artifactId>log4j</artifactId>
<version>1.2.14</version>
</dependency>
<dependency>
<groupId>org.mongodb.mongo-hadoop</groupId>
<artifactId>mongo-hadoop-core</artifactId>
<version>1.4.1</version>
</dependency>
</dependencies>
Run Code Online (Sandbox Code Playgroud)
Java代码
Configuration conf = new Configuration();
conf.set("mongo.job.input.format", "com.mongodb.hadoop.MongoInputFormat");
conf.set("mongo.input.uri", "mongodb://localhost:27017/databasename.collectionname");
SparkConf sconf = new SparkConf().setMaster("local").setAppName("Spark UM Jar");
JavaRDD<User> UserMaster = sc.newAPIHadoopRDD(conf, MongoInputFormat.class, Object.class, BSONObject.class)
.map(new Function<Tuple2<Object, BSONObject>, User>() {
@Override
public User call(Tuple2<Object, BSONObject> v1) throws Exception {
//return User
}
}
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
3275 次 |
| 最近记录: |