在Yarn Cluster上运行的Spark作业java.io.FileNotFoundException:文件在主节点上退出时不会退出

Question

在Yarn Cluster上运行的Spark作业java.io.FileNotFoundException:文件在主节点上退出时不会退出

Dev*_*per 9 hadoop hadoop-yarn apache-spark spark-streaming

我对Spark很新.我尝试过搜索,但我找不到合适的解决方案.我在两个盒子(一个主节点和另一个工作节点)上安装了hadoop 2.7.2我按照以下链接设置了集群http://javadev.org/docs/hadoop/centos/6/installation/multi- node-installation-on-centos-6-non-sucure-mode / 我以root用户身份运行hadoop和spark应用程序来测试集群.

我已在主节点上安装了spark,并且spark正在启动而没有任何错误.但是,当我使用spark submit提交作业时,我发现File Not Found异常,即使该文件存在于错误中同一位置的主节点中.我正在执行Spark Submit命令下面,请在下面找到日志输出命令.

/bin/spark-submit  --class com.test.Engine  --master yarn --deploy-mode      cluster /app/spark-test.jar

Run Code Online (Sandbox Code Playgroud)

16/04/21 19:16:13 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/04/21 19:16:13 INFO RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
16/04/21 19:16:14 INFO Client: Requesting a new application from cluster with 1 NodeManagers
16/04/21 19:16:14 INFO Client: Verifying our application has not requested more than the maximum memory capability of the cluster (8192 MB per container)
16/04/21 19:16:14 INFO Client: Will allocate AM container, with 1408 MB memory including 384 MB overhead
16/04/21 19:16:14 INFO Client: Setting up container launch context for our AM
16/04/21 19:16:14 INFO Client: Setting up the launch environment for our AM container
16/04/21 19:16:14 INFO Client: Preparing resources for our AM container
16/04/21 19:16:14 INFO Client: Source and destination file systems are the same. Not copying file:/mi/spark/lib/spark-assembly-1.6.1-hadoop2.6.0.jar
16/04/21 19:16:14 INFO Client: Source and destination file systems are the same. Not copying file:/app/spark-test.jar
16/04/21 19:16:14 INFO Client: Source and destination file systems are the same. Not copying file:/tmp/spark-120aeddc-0f87-4411-9400-22ba01096249/__spark_conf__5619348744221830008.zip
16/04/21 19:16:14 INFO SecurityManager: Changing view acls to: root
16/04/21 19:16:14 INFO SecurityManager: Changing modify acls to: root
16/04/21 19:16:14 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root)
16/04/21 19:16:15 INFO Client: Submitting application 1 to ResourceManager
16/04/21 19:16:15 INFO YarnClientImpl: Submitted application application_1461246306015_0001
16/04/21 19:16:16 INFO Client: Application report for application_1461246306015_0001 (state: ACCEPTED)
16/04/21 19:16:16 INFO Client: 
     client token: N/A
     diagnostics: N/A
     ApplicationMaster host: N/A
     ApplicationMaster RPC port: -1
     queue: default
     start time: 1461246375622
     final status: UNDEFINEDsparkcluster01.testing.com
     tracking URL: http://sparkcluster01.testing.com:8088/proxy/application_1461246306015_0001/
     user: root
16/04/21 19:16:17 INFO Client: Application report for application_1461246306015_0001 (state: ACCEPTED)
16/04/21 19:16:18 INFO Client: Application report for application_1461246306015_0001 (state: ACCEPTED)
16/04/21 19:16:19 INFO Client: Application report for application_1461246306015_0001 (state: ACCEPTED)
16/04/21 19:16:20 INFO Client: Application report for application_1461246306015_0001 (state: ACCEPTED)
16/04/21 19:16:21 INFO Client: Application report for application_1461246306015_0001 (state: FAILED)
16/04/21 19:16:21 INFO Client: 
     client token: N/A
     diagnostics: Application application_1461246306015_0001 failed 2 times due to AM Container for appattempt_1461246306015_0001_000002 exited with  exitCode: -1000
For more detailed output, check application tracking page:http://sparkcluster01.testing.com:8088/cluster/app/application_1461246306015_0001Then, click on links to logs of each attempt.
Diagnostics: java.io.FileNotFoundException: File file:/app/spark-test.jar does not exist
Failing this attempt. Failing the application.
     ApplicationMaster host: N/A
     ApplicationMaster RPC port: -1
     queue: default
     start time: 1461246375622
     final status: FAILED
     tracking URL: http://sparkcluster01.testing.com:8088/cluster/app/application_1461246306015_0001
     user: root
Exception in thread "main" org.ap/app/spark-test.jarache.spark.SparkException: Application application_1461246306015_0001 finished with failed status
    at org.apache.spark.deploy.yarn.Client.run(Client.scala:1034)
    at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1081)
    at org.apache.spark.deploy.yarn.Client.main(Client.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
    at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
    at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

我甚至尝试通过将我的应用程序放在HDFS上并在Spark Submit命令中提供HDFS路径来在HDFS文件系统上运行spark.即便如此,它在一些Spark Conf文件中抛出File Not Found Exception.我正在执行Spark Submit命令,请在命令下面找到日志输出.

 ./bin/spark-submit  --class com.test.Engine  --master yarn --deploy-mode cluster hdfs://sparkcluster01.testing.com:9000/beacon/job/spark-test.jar

Run Code Online (Sandbox Code Playgroud)

16/04/21 18:11:45 INFO RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
16/04/21 18:11:46 INFO Client: Requesting a new application from cluster with 1 NodeManagers
16/04/21 18:11:46 INFO Client: Verifying our application has not requested more than the maximum memory capability of the cluster (8192 MB per container)
16/04/21 18:11:46 INFO Client: Will allocate AM container, with 1408 MB memory including 384 MB overhead
16/04/21 18:11:46 INFO Client: Setting up container launch context for our AM
16/04/21 18:11:46 INFO Client: Setting up the launch environment for our AM container
16/04/21 18:11:46 INFO Client: Preparing resources for our AM container
16/04/21 18:11:46 INFO Client: Source and destination file systems are the same. Not copying file:/mi/spark/lib/spark-assembly-1.6.1-hadoop2.6.0.jar
16/04/21 18:11:47 INFO Client: Uploading resource hdfs://sparkcluster01.testing.com:9000/beacon/job/spark-test.jar -> file:/root/.sparkStaging/application_1461234217994_0017/spark-test.jar
16/04/21 18:11:49 INFO Client: Source and destination file systems are the same. Not copying file:/tmp/spark-f4eef3ac-2add-42f8-a204-be7959c26f21/__spark_conf__6818051470272245610.zip
16/04/21 18:11:50 INFO SecurityManager: Changing view acls to: root
16/04/21 18:11:50 INFO SecurityManager: Changing modify acls to: root
16/04/21 18:11:50 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root)
16/04/21 18:11:50 INFO Client: Submitting application 17 to ResourceManager
16/04/21 18:11:50 INFO YarnClientImpl: Submitted application application_1461234217994_0017
16/04/21 18:11:51 INFO Client: Application report for application_1461234217994_0017 (state: ACCEPTED)
16/04/21 18:11:51 INFO Client: 
     client token: N/A
     diagnostics: N/A
     ApplicationMaster host: N/A
     ApplicationMaster RPC port: -1
     queue: default
     start time: 1461242510849
     final status: UNDEFINED
     tracking URL: http://sparkcluster01.testing.com:8088/proxy/application_1461234217994_0017/
     user: root
16/04/21 18:11:52 INFO Client: Application report for application_1461234217994_0017 (state: ACCEPTED)
16/04/21 18:11:53 INFO Client: Application report for application_1461234217994_0017 (state: ACCEPTED)
16/04/21 18:11:54 INFO Client: Application report for application_1461234217994_0017 (state: FAILED)
16/04/21 18:11:54 INFO Client: 
     client token: N/A
     diagnostics: Application application_1461234217994_0017 failed 2 times due to AM Container for appattempt_1461234217994_0017_000002 exited with  exitCode: -1000
For more detailed output, check application tracking page:http://sparkcluster01.testing.com:8088/cluster/app/application_1461234217994_0017Then, click on links to logs of each attempt.
Diagnostics: File file:/tmp/spark-f4eef3ac-2add-42f8-a204-be7959c26f21/__spark_conf__6818051470272245610.zip does not exist
java.io.FileNotFoundException: File file:/tmp/spark-f4eef3ac-2add-42f8-a204-be7959c26f21/__spark_conf__6818051470272245610.zip does not exist
    at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:609)
    at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:822)
    at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:599)
    at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:421)
    at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:253)
    at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:63)
    at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:361)
    at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:359)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
    at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:358)
    at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:62)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)

Failing this attempt. Failing the application.
     ApplicationMaster host: N/A
     ApplicationMaster RPC port: -1
     queue: default
     start time: 1461242510849
     final status: FAILED
     tracking URL: http://sparkcluster01.testing.com:8088/cluster/app/application_1461234217994_0017
     user: root
Exception in thread "main" org.apache.spark.SparkException: Application application_1461234217994_0017 finished with failed status
    at org.apache.spark.deploy.yarn.Client.run(Client.scala:1034)
    at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1081)
    at org.apache.spark.deploy.yarn.Client.main(Client.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
    at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
    at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
16/04/21 18:11:55 INFO ShutdownHookManager: Shutdown hook called
16/04/21 18:11:55 INFO ShutdownHookManager: Deleting directory /tmp/spark-f4eef3ac-2add-42f8-a204-be7959c26f21

Answer 1

Dev*_*per 8

spark配置没有指向正确的hadoop Configuration目录.2.7.2的hadoop配置位于文件路径hadoop 2.7.2./etc/hadoop/而不是/root/hadoop2.7.2/conf.当我在spark-env.sh下指向HADOOP_CONF_DIR =/root/hadoop2.7.2/etc/hadoop /时,spark提交开始工作,File not found异常消失.之前它指向/root/hadoop2.7.2/conf(它不会退出).如果spark没有指向正确的hadoop配置目录,则可能导致类似的错误.我认为它可能是一个火花中的错误,它应该优雅地处理它而不是抛出含糊不清的错误消息.

归档时间：	9 年，8 月前
查看次数：	18959 次
最近记录：	7 年，1 月前