我对Spark的了解是有限的,你在阅读这个问题后会感觉到它.我只有一个节点和火花,hadoop和纱线安装在它上面.
我能够通过下面的命令在集群模式下编码和运行字数统计问题
spark-submit --class com.sanjeevd.sparksimple.wordcount.JobRunner
--master yarn
--deploy-mode cluster
--driver-memory=2g
--executor-memory 2g
--executor-cores 1
--num-executors 1
SparkSimple-0.0.1SNAPSHOT.jar
hdfs://sanjeevd.br:9000/user/spark-test/word-count/input
hdfs://sanjeevd.br:9000/user/spark-test/word-count/output
Run Code Online (Sandbox Code Playgroud)
它工作得很好.
现在我明白了"火花上的火花"需要集群上可用的火花罐文件,如果我什么都不做,那么每次运行我的程序时,它都会将数百个jar文件从$ SPARK_HOME复制到每个节点(在我看来是这样的)只有一个节点).我看到代码的执行在完成复制之前暂停了一段时间.见下文 -
16/12/12 17:24:03 WARN yarn.Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
16/12/12 17:24:06 INFO yarn.Client: Uploading resource file:/tmp/spark-a6cc0d6e-45f9-4712-8bac-fb363d6992f2/__spark_libs__11112433502351931.zip -> hdfs://sanjeevd.br:9000/user/sanjeevd/.sparkStaging/application_1481592214176_0001/__spark_libs__11112433502351931.zip
16/12/12 17:24:08 INFO yarn.Client: Uploading resource file:/home/sanjeevd/personal/Spark-Simple/target/SparkSimple-0.0.1-SNAPSHOT.jar -> hdfs://sanjeevd.br:9000/user/sanjeevd/.sparkStaging/application_1481592214176_0001/SparkSimple-0.0.1-SNAPSHOT.jar
16/12/12 17:24:08 INFO yarn.Client: Uploading resource file:/tmp/spark-a6cc0d6e-45f9-4712-8bac-fb363d6992f2/__spark_conf__6716604236006329155.zip -> hdfs://sanjeevd.br:9000/user/sanjeevd/.sparkStaging/application_1481592214176_0001/__spark_conf__.zip
Run Code Online (Sandbox Code Playgroud)
Spark的文档建议设置spark.yarn.jars属性以避免这种复制.所以我在spark-defaults.conf文件下面的属性下面设置.
spark.yarn.jars hdfs://sanjeevd.br:9000//user/spark/share/lib
Run Code Online (Sandbox Code Playgroud)
http://spark.apache.org/docs/latest/running-on-yarn.html#preparations 要从YARN端访问Spark运行时jar,可以指定spark.yarn.archive或spark.yarn.jars.有关详细信息,请参阅Spark属性.如果既未指定spark.yarn.archive也未指定spark.yarn.jars,Spark将创建一个包含$ SPARK_HOME/jars下所有jar的zip文件,并将其上传到分布式缓存.
顺便说一句,我有从LOCAL /opt/spark/jars到HDFS的所有jar文件/user/spark/share/lib.他们的数量是206.
这让我的jar失败了.以下是错误 -
spark-submit --class com.sanjeevd.sparksimple.wordcount.JobRunner --master yarn --deploy-mode cluster --driver-memory=2g --executor-memory 2g --executor-cores 1 --num-executors 1 SparkSimple-0.0.1-SNAPSHOT.jar hdfs://sanjeevd.br:9000/user/spark-test/word-count/input hdfs://sanjeevd.br:9000/user/spark-test/word-count/output
16/12/12 17:43:06 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/12/12 17:43:07 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
16/12/12 17:43:07 INFO yarn.Client: Requesting a new application from cluster with 1 NodeManagers
16/12/12 17:43:07 INFO yarn.Client: Verifying our application has not requested more than the maximum memory capability of the cluster (5120 MB per container)
16/12/12 17:43:07 INFO yarn.Client: Will allocate AM container, with 2432 MB memory including 384 MB overhead
16/12/12 17:43:07 INFO yarn.Client: Setting up container launch context for our AM
16/12/12 17:43:07 INFO yarn.Client: Setting up the launch environment for our AM container
16/12/12 17:43:07 INFO yarn.Client: Preparing resources for our AM container
16/12/12 17:43:07 INFO yarn.Client: Uploading resource file:/home/sanjeevd/personal/Spark-Simple/target/SparkSimple-0.0.1-SNAPSHOT.jar -> hdfs://sanjeevd.br:9000/user/sanjeevd/.sparkStaging/application_1481592214176_0005/SparkSimple-0.0.1-SNAPSHOT.jar
16/12/12 17:43:07 INFO yarn.Client: Uploading resource file:/tmp/spark-fae6a5ad-65d9-4b64-9ba6-65da1310ae9f/__spark_conf__7881471844385719101.zip -> hdfs://sanjeevd.br:9000/user/sanjeevd/.sparkStaging/application_1481592214176_0005/__spark_conf__.zip
16/12/12 17:43:08 INFO spark.SecurityManager: Changing view acls to: sanjeevd
16/12/12 17:43:08 INFO spark.SecurityManager: Changing modify acls to: sanjeevd
16/12/12 17:43:08 INFO spark.SecurityManager: Changing view acls groups to:
16/12/12 17:43:08 INFO spark.SecurityManager: Changing modify acls groups to:
16/12/12 17:43:08 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(sanjeevd); groups with view permissions: Set(); users with modify permissions: Set(sanjeevd); groups with modify permissions: Set()
16/12/12 17:43:08 INFO yarn.Client: Submitting application application_1481592214176_0005 to ResourceManager
16/12/12 17:43:08 INFO impl.YarnClientImpl: Submitted application application_1481592214176_0005
16/12/12 17:43:09 INFO yarn.Client: Application report for application_1481592214176_0005 (state: ACCEPTED)
16/12/12 17:43:09 INFO yarn.Client:
client token: N/A
diagnostics: N/A
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: default
start time: 1481593388442
final status: UNDEFINED
tracking URL: http://sanjeevd.br:8088/proxy/application_1481592214176_0005/
user: sanjeevd
16/12/12 17:43:10 INFO yarn.Client: Application report for application_1481592214176_0005 (state: FAILED)
16/12/12 17:43:10 INFO yarn.Client:
client token: N/A
diagnostics: Application application_1481592214176_0005 failed 1 times due to AM Container for appattempt_1481592214176_0005_000001 exited with exitCode: 1
For more detailed output, check application tracking page:http://sanjeevd.br:8088/cluster/app/application_1481592214176_0005Then, click on links to logs of each attempt.
Diagnostics: Exception from container-launch.
Container id: container_1481592214176_0005_01_000001
Exit code: 1
Stack trace: ExitCodeException exitCode=1:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:545)
at org.apache.hadoop.util.Shell.run(Shell.java:456)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:722)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:211)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Container exited with a non-zero exit code 1
Failing this attempt. Failing the application.
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: default
start time: 1481593388442
final status: FAILED
tracking URL: http://sanjeevd.br:8088/cluster/app/application_1481592214176_0005
user: sanjeevd
16/12/12 17:43:10 INFO yarn.Client: Deleting staging directory hdfs://sanjeevd.br:9000/user/sanjeevd/.sparkStaging/application_1481592214176_0005
Exception in thread "main" org.apache.spark.SparkException: Application application_1481592214176_0005 finished with failed status
at org.apache.spark.deploy.yarn.Client.run(Client.scala:1132)
at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1175)
at org.apache.spark.deploy.yarn.Client.main(Client.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:736)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:185)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:210)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:124)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
16/12/12 17:43:10 INFO util.ShutdownHookManager: Shutdown hook called
16/12/12 17:43:10 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-fae6a5ad-65d9-4b64-9ba6-65da1310ae9f
Run Code Online (Sandbox Code Playgroud)
你知道我在做什么错吗?任务的日志如下所示 -
Error: Could not find or load main class org.apache.spark.deploy.yarn.ApplicationMaster
Run Code Online (Sandbox Code Playgroud)
我理解找不到ApplicationMaster类的错误,但我的问题是为什么找不到它 - 这个类应该在哪里?我没有装配jar,因为我使用的是spark 2.0.1,其中没有组装捆绑.
这与spark.yarn.jars财产有什么关系?这个属性是为了帮助激发纱线,这应该是它.使用时还需要做些什么spark.yarn.jars?
感谢您提前阅读此问题并提供帮助.
bor*_*ice 19
您还可以使用该spark.yarn.archive选项并将其设置为存档(您创建)的位置,该存档包含$SPARK_HOME/jars/文件夹中存档的根级别的所有JAR .例如:
jar cv0f spark-libs.jar -C $SPARK_HOME/jars/ .hdfs dfs -put spark-libs.jar /some/path/. hdfs dfs –setrep -w 10 hdfs:///some/path/spark-libs.jar(更改与NodeManager总数成比例的副本数量)spark.yarn.archive为hdfs:///some/path/spark-libs.jarSan*_*man 14
我终于能够理解这个属性了.我通过hit-n-trial发现了这个属性的正确语法
spark.yarn.jars = HDFS:// XX:9000 /用户/火花/共享/ LIB/*罐
我没有放到*.jar最后,我的路径刚刚以/ lib结束.我试着像这样放置实际的装配罐 - spark.yarn.jars=hdfs://sanjeevd.brickred:9000/user/spark/share/lib/spark-yarn_2.11-2.0.1.jar但没有运气.它说无法加载ApplicationMaster.
我在/sf/answers/2882572591/上发布了我对某人提出的类似问题的回复
如果你看一下spark.yarn.jars文档,它会说明如下
包含要分发到YARN容器的Spark代码的库列表.默认情况下,YARN上的Spark将使用本地安装的Spark jar,但Spark jar也可以位于HDFS上的世界可读位置.这允许YARN将其缓存在节点上,这样每次应用程序运行时都不需要分发它.例如,要指向HDFS上的jar,请将此配置设置为hdfs:/// some/path.允许使用全球.
这意味着您实际上覆盖了SPARK_HOME/jar并告诉纱线从路径中获取应用程序运行所需的所有jar,如果设置了spark.yarn.jars属性,则应该存在要运行spark的所有相关jar在此路径中,如果您查看SPARK_HOME/lib中存在的spark-assembly.jar,org.apache.spark.deploy.yarn.ApplicationMaster则会出现类,因此请确保所有spark依赖项都存在于您指定为spark.yarn.jars的HDFS路径中.