Abh*_*ute 10 hadoop classpath maven-3 apache-spark
通过为maven依赖项提供spark-classPath来减小应用程序jar的大小:
我的集群有3个ec2实例,其中hadoop和spark正在运行.如果我构建具有maven依赖关系的jar,它会变得太大(大约100 MB)我想避免这个,因为Jar正在所有节点上复制,每次我跑这份工作.
为了避免这种情况,我将maven包构建为"maven包".对于依赖项解析,我已经在每个节点上下载了所有maven依赖项,然后仅在jar路径下方提供:
我已经在"spark-defaults.conf " 中的每个节点上添加了类路径
spark.driver.extraClassPath /home/spark/.m2/repository/com/google/code/gson/gson/2.3.1/gson-2.3.1.jar:/home/spark/.m2/repository/com/datastax/cassandra/cassandra-driver-core/2.1.5/cassandra-driver-core-2.1.5.jar:/home/spark/.m2/repository/com/google/guava/guava/16.0.1/guava-16.0.1.jar:/home/spark/.m2/repository/com/google/collections/google-collections/1.0/google-collections-1.0.jar:/home/spark/.m2/repository/com/datastax/spark/spark-cassandra-connector-java_2.10/1.2.0-rc1/spark-cassandra-connector-java_2.10-1.2.0-rc1.jar:/home/spark/.m2/repository/com/datastax/spark/spark-cassandra-connector_2.10/1.2.0-rc1/spark-cassandra-connector_2.10-1.2.0-rc1.jar:/home/spark/.m2/repository/org/apache/cassandra/cassandra-thrift/2.1.3/cassandra-thrift-2.1.3.jar:/home/spark/.m2/repository/org/joda/joda-convert/1.2/joda-convert-1.2.jar
Run Code Online (Sandbox Code Playgroud)
它在单个节点上本地工作.我仍然得到这个错误.任何帮助将不胜感激.
最后,我能够解决问题.我使用"mvn package"而不是"mvn clean compile assembly:single"创建了应用程序jar ,因此在创建jar时它不会下载maven依赖项(但需要提供这些jar/dependencies运行时),这导致了小大小Jar(因为只有依赖项的引用).
然后,我已经添加以下两个参数在火花defaults.conf每个节点上:
spark.driver.extraClassPath /home/spark/.m2/repository/com/datastax/cassandra/cassandra-driver-core/2.1.7/cassandra-driver-core-2.1.7.jar:/home/spark/.m2/repository/com/googlecode/json-simple/json-simple/1.1/json-simple-1.1.jar:/home/spark/.m2/repository/com/google/code/gson/gson/2.3.1/gson-2.3.1.jar:/home/spark/.m2/repository/com/google/guava/guava/16.0.1/guava-16.0.1.jar
spark.executor.extraClassPath /home/spark/.m2/repository/com/datastax/cassandra/cassandra-driver-core/2.1.7/cassandra-driver-core-2.1.7.jar:/home/spark/.m2/repository/com/googlecode/json-simple/json-simple/1.1/json-simple-1.1.jar:/home/spark/.m2/repository/com/google/code/gson/gson/2.3.1/gson-2.3.1.jar:/home/spark/.m2/repository/com/google/guava/guava/16.0.1/guava-16.0.1.jar
Run Code Online (Sandbox Code Playgroud)
因此,问题出现了,应用程序JAR将如何获取maven依赖项(所需的jar)运行时?
为此,我已经使用mvn clean编译程序集在每个节点上下载了所有必需的依赖项 :提前单个.
| 归档时间: |
|
| 查看次数: |
10240 次 |
| 最近记录: |