基本Spark示例不起作用

Krz*_*lak 8 apache-spark

我正在学习Spark,并希望运行由两台物理机器组成的最简单的集群.我已完成所有基本设置,似乎没问题.自动启动脚本的输出如下所示:

[username@localhost sbin]$ ./start-all.sh 
starting org.apache.spark.deploy.master.Master, logging to /home/username/spark-1.6.0-bin-hadoop2.6/logs/spark-username-org.apache.spark.deploy.master.Master-1-localhost.out
localhost: starting org.apache.spark.deploy.worker.Worker, logging to /home/sername/spark-1.6.0-bin-hadoop2.6/logs/spark-username-org.apache.spark.deploy.worker.Worker-1-localhost.out
username@192.168.???.??: starting org.apache.spark.deploy.worker.Worker, logging to /home/username/spark-1.6.0-bin-hadoop2.6/logs/spark-username-org.apache.spark.deploy.worker.Worker-1-localhost.localdomain.out
Run Code Online (Sandbox Code Playgroud)

所以这里没有错误,似乎主节点正在运行以及两个Worker节点.但是,当我在192.168.?????:8080打开WebGUI时,它只列出一个工作人员 - 本地工作人员.我的问题与此处描述的类似:Spark Clusters:工作者信息不会显示在Web UI上,但我的/ etc/hosts文件中没有任何内容.它包含的全部是:

127.0.0.1 localhost.localdomain localhost
::1 localhost6.localdomain6 localhost6 
Run Code Online (Sandbox Code Playgroud)

我错过了什么?两台机器都运行Fedora Workstation x86_64.

zer*_*323 5

基本上问题的根源是主主机名解析为localhost.它在两个控制台输出中都可见:

starting org.apache.spark.deploy.master.Master, logging to 
/home/.../spark-username-org.apache.spark.deploy.master.Master-1-localhost.out
Run Code Online (Sandbox Code Playgroud)

最后一部分对应主机名的位置.您可以在主日志中看到相同的行为:

16/02/17 11:13:54 WARN Utils: Your hostname, localhost resolves to a loopback address: 127.0.0.1; using 192.168.128.224 instead (on interface eno1)
Run Code Online (Sandbox Code Playgroud)

和远程工作者日志:

16/02/17 11:13:58 WARN Worker: Failed to connect to master localhost:7077
java.io.IOException: Failed to connect to localhost/127.0.0.1:7077
    at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:216)
    at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:167)
    at org.apache.spark.rpc.netty.NettyRpcEnv.createClient(NettyRpcEnv.scala:200)
    at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:187)
    at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:183)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.ConnectException: Connection refused: localhost/127.0.0.1:7077
Run Code Online (Sandbox Code Playgroud)

这意味着远程工作者尝试访问主服务器localhost并且明显失败.即使工人能够连接到主人,我也不会出于同样的原因反向工作.

解决这个问题的一些方法:

  • 为工作人员和主人提供适当的网络配置,以确保每台计算机使用的主机名可以正确解析为相应的IP地址.
  • 使用ssh隧道转发远程工作者和主服务器之间的所有必需端口.