如何远程运行Apache Spark shell？

Question

如何远程运行Apache Spark shell？

我有一个Spark群集设置,一个主人和三个工人.我也在CentOS VM上安装了Spark.我正在尝试从我的本地VM运行一个Spark shell,它将连接到master,并允许我执行简单的Scala代码.所以,这是我在本地VM上运行的命令:

bin/spark-shell --master spark://spark01:7077

Run Code Online (Sandbox Code Playgroud)

shell运行到我可以输入Scala代码的位置.它说执行者已被授予(x3 - 每个工人一个).如果我查看Master的UI,我可以看到一个正在运行的应用程序,Spark shell.所有工作者都是ALIVE,使用了2/2个核心,并为应用程序分配了512 MB(5 GB中).所以,我尝试执行以下Scala代码:

sc.parallelize(1 to 100).count

Run Code Online (Sandbox Code Playgroud)

不幸的是,该命令不起作用.shell将无休止地打印相同的警告:

INFO SparkContext: Starting job: count at <console>:13
INFO DAGScheduler: Got job 0 (count at <console>:13) with 2 output partitions (allowLocal=false)
INFO DAGScheduler: Final stage: Stage 0(count at <console>:13) with 2 output partitions (allowLocal=false)
INFO DAGScheduler: Parents of final stage: List()
INFO DAGScheduler: Missing parents: List()
INFO DAGScheduler: Submitting Stage 0 (Parallel CollectionRDD[0] at parallelize at <console>:13), which has no missing parents
INFO DAGScheduler: Submitting 2 missing tasts from Stage 0 (ParallelCollectionRDD[0] at parallelize at <console>:13)
INFO TaskSchedulerImpl: Adding task set 0.0 with 2 tasks
WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient memory
WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient memory
WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient memory

Run Code Online (Sandbox Code Playgroud)

在研究了这个问题后,我确认我使用的主URL与Web UI上的主URL相同.我可以ping和ssh两种方式(群集到本地VM,反之亦然).而且,我使用了executor-memory参数(增加和减少内存)都无济于事.最后,我尝试禁用双方的防火墙(iptables),但我一直得到同样的错误.我正在使用Spark 1.0.2.

TL; DR是否可以远程运行Apache Spark shell(并固有地远程提交应用程序)？如果是这样,我错过了什么？

编辑:我看了一下工作日志,发现工人找不到Spark:

ERROR org.apache.spark.deploy.worker.ExecutorRunner: Error running executor
java.io.IOException: Cannot run program "/usr/bin/spark-1.0.2/bin/compute-classpath.sh" (in directory "."): error=2, No such file or directory
...

Run Code Online (Sandbox Code Playgroud)

Spark安装在本地VM上的不同目录中,而不是安装在群集上.工作者试图找到的路径是我本地VM上的路径.有没有办法指定这条路径？或者他们到处都必须相同？

目前,我调整了我的目录来规避这个错误.现在,在我有机会输入count命令(Master removed our application: FAILED)之前,我的Spark Shell失败了.所有工人都有同样的错误:

ERROR akka.remote.EndpointWriter: AssociationError [akka.tcp://sparkWorker@spark02:7078] -> [akka.tcp://sparkExecutor@spark02:53633]:
Error [Association failed with [akka.tcp://sparkExecutor@spark02:53633]] 
[akka.remote.EndpointAssociationException: Association failed with [akka.tcp://sparkExecutor@spark02:53633] 
Caused by: akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$annon2: Connection refused: spark02/192.168.64.2:53633

Run Code Online (Sandbox Code Playgroud)

怀疑,我遇到了网络问题.我现在应该看什么？

Answer 1

小智 3

我在我的 Spark 客户端和 Spark cluster\xe3\x80\x82 上解决了这个问题

\n\n

检查网络，客户端A可以互相ping通集群！然后在客户端 A\xe3\x80\x82 上的 Spark-env.sh 中添加两行配置

\n\n

第一的

\n\n

export SPARK_MASTER_IP=172.100.102.156  \nexport SPARK_JAR=/usr/spark-1.1.0-bin-hadoop2.4/lib/spark-assembly-1.1.0-hadoop2.4.0.jar\n

Run Code Online (Sandbox Code Playgroud)\n\n

第二

\n\n

使用集群模式测试您的 Spark shell！

\n

归档时间：	10 年，12 月前
查看次数：	20101 次
最近记录：	8 年，11 月前