升级到Spark 1.3.0时出现JAVA_HOME错误

Ken*_*ams 7 java hadoop scala apache-spark

我正在尝试将用Scala编写的Spark项目从Spark 1.2.1升级到1.3.0,所以我改变了我的build.sbt喜好:

-libraryDependencies += "org.apache.spark" %% "spark-core" % "1.2.1" % "provided"
+libraryDependencies += "org.apache.spark" %% "spark-core" % "1.3.0" % "provided"
Run Code Online (Sandbox Code Playgroud)

然后制作一个assembly罐子,然后提交:

HADOOP_CONF_DIR=/etc/hadoop/conf \
    spark-submit \
    --driver-class-path=/etc/hbase/conf \
    --conf spark.hadoop.validateOutputSpecs=false \
    --conf spark.yarn.jar=hdfs:/apps/local/spark-assembly-1.3.0-hadoop2.4.0.jar \
    --conf spark.serializer=org.apache.spark.serializer.KryoSerializer \
    --deploy-mode=cluster \
    --master=yarn \
    --class=TestObject \
    --num-executors=54 \
    target/scala-2.11/myapp-assembly-1.2.jar
Run Code Online (Sandbox Code Playgroud)

作业无法提交,终端中存在以下异常:

15/03/19 10:30:07 INFO yarn.Client: 
15/03/19 10:20:03 INFO yarn.Client: 
     client token: N/A
     diagnostics: Application application_1420225286501_4698 failed 2 times due to AM 
     Container for appattempt_1420225286501_4698_000002 exited with  exitCode: 127 
     due to: Exception from container-launch: 
org.apache.hadoop.util.Shell$ExitCodeException: 
    at org.apache.hadoop.util.Shell.runCommand(Shell.java:464)
    at org.apache.hadoop.util.Shell.run(Shell.java:379)
    at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589)
    at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
    at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:283)
    at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:79)
    at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
    at java.util.concurrent.FutureTask.run(FutureTask.java:138)
    at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
    at java.lang.Thread.run(Thread.java:662)
Run Code Online (Sandbox Code Playgroud)

最后,我去检查YARN应用程序主人的Web界面(因为作业在那里,我知道它至少使它成为那么远),它显示的唯一日志是这些:

    Log Type: stderr
    Log Length: 61
    /bin/bash: {{JAVA_HOME}}/bin/java: No such file or directory

    Log Type: stdout
    Log Length: 0
Run Code Online (Sandbox Code Playgroud)

我不确定如何解释 - 是{{JAVA_HOME}}一个文字(包括括号),它以某种方式使它成为一个脚本?这是来自工作者节点还是驱动程序?我能做些什么来试验和排除故障?

我确实JAVA_HOME在集群的所有节点上设置了hadoop配置文件:

% grep JAVA_HOME /etc/hadoop/conf/*.sh
/etc/hadoop/conf/hadoop-env.sh:export JAVA_HOME=/usr/jdk64/jdk1.6.0_31
/etc/hadoop/conf/yarn-env.sh:export JAVA_HOME=/usr/jdk64/jdk1.6.0_31
Run Code Online (Sandbox Code Playgroud)

自1.2.1以来,此行为是否在1.3.0中发生了变化?使用1.2.1并且不做任何其他更改,作业完成得很好.

[注意:我最初在Spark邮件列表上发布了这个,如果/当我找到解决方案时,我会更新这两个地方.]

Ken*_*ams 1

好的,所以我让办公室里的其他人帮忙解决这个问题,我们找到了一个解决方案。我不确定其中有多少是特定于 CentOS 上的 Hortonworks HDP 2.0.6 的文件布局,这正是我们在集群上运行的。

\n\n

我们手动将一些目录从一台集群机器(或任何可以成功使用 Hadoop 客户端的机器)复制到本地机器。我们称那台机器为$GOOD

\n\n

设置 Hadoop 配置文件:

\n\n
cd /etc\nsudo mkdir hbase hadoop\nsudo scp -r $GOOD:/etc/hbase/conf hbase\nsudo scp -r $GOOD:/etc/hadoop/conf hadoop\n
Run Code Online (Sandbox Code Playgroud)\n\n

设置 Hadoop 库和可执行文件:

\n\n
mkdir ~/my-hadoop\nscp -r $GOOD:/usr/lib/hadoop\\* ~/my-hadoop\ncd /usr/lib\nsudo ln \xe2\x80\x93s ~/my-hadoop/* .\npath+=(/usr/lib/hadoop*/bin)  # Add to $PATH (this syntax is for zsh)\n
Run Code Online (Sandbox Code Playgroud)\n\n

设置 Spark 库和可执行文件:

\n\n
cd ~/Downloads\nwget http://apache.mirrors.lucidnetworks.net/spark/spark-1.4.1/spark-1.4.1-bin-without-hadoop.tgz\ntar -zxvf spark-1.4.1-bin-without-hadoop.tgz\ncd spark-1.4.1-bin-without-hadoop\npath+=(`pwd`/bin)\nhdfs dfs -copyFromLocal lib/spark-assembly-*.jar /apps/local/\n
Run Code Online (Sandbox Code Playgroud)\n\n

设置一些环境变量:

\n\n
export JAVA_HOME=$(/usr/libexec/java_home -v 1.7)\nexport HADOOP_CONF_DIR=/etc/hadoop/conf\nexport SPARK_DIST_CLASSPATH=$(hadoop --config $HADOOP_CONF_DIR classpath)\n`grep \'export HADOOP_LIBEXEC_DIR\' $HADOOP_CONF_DIR/yarn-env.sh`\nexport SPOPTS="--driver-java-options=-Dorg.xerial.snappy.lib.name=libsnappyjava.jnilib"\nexport SPOPTS="$SPOPTS --conf spark.yarn.jar=hdfs:/apps/local/spark-assembly-1.4.1-hadoop2.2.0.jar"\n
Run Code Online (Sandbox Code Playgroud)\n\n

现在可以像这样运行各种 Spark shell:

\n\n
sparkR --master yarn $SPOPTS\nspark-shell --master yarn $SPOPTS\npyspark --master yarn $SPOPTS\n
Run Code Online (Sandbox Code Playgroud)\n\n

一些备注:

\n\n
    \n
  • JAVA_HOME设置与我一直以来的设置相同 - 只是将其包含在此处以完成设置。所有的焦点都被JAVA_HOME证明是转移注意力的东西。
  • \n
  • --driver-java-options=-Dorg.xerial.snappy.lib.name=libsnappyjava.jnilib是必要的,因为我收到了有关 的错误java.lang.UnsatisfiedLinkError: no snappyjava in java.library.path。该jnilib文件是 OS X 的正确选择。
  • \n
  • --conf spark.yarn.jar只是为了节省时间,避免每次启动 shell 或提交作业时将程序集文件重新复制到集群。
  • \n
\n