Fat*_*sta 13 cluster-computing akka apache-spark
我想用我的两个虚拟机将Spark Standlone模式安装到Cluster.
使用spark-0.9.1-bin-hadoop1的版本,我在每个vm中成功执行spark-shell.我按照官方文档制作一个vm(ip:xx.xx.xx.223)作为Master和Worker,并将另一个(ip:xx.xx.xx.224)作为Worker.
但是224-ip vm无法连接223-ip vm.接下来是223(Master)的主日志:
[@tc-52-223 logs]# tail -100f spark-root-org.apache.spark.deploy.master.Master-1-tc-52-223.out
Spark Command: /usr/local/jdk/bin/java -cp :/data/test/spark-0.9.1-bin-hadoop1/conf:/data/test/spark-0.9.1-bin-hadoop1/assembly/target/scala-2.10/spark-assembly_2.10-0.9.1-hadoop1.0.4.jar -Dspark.akka.logLifecycleEvents=true -Djava.library.path= -Xms512m -Xmx512m org.apache.spark.deploy.master.Master --ip 10.11.52.223 --port 7077 --webui-port 8080
log4j:WARN No appenders could be found for logger (akka.event.slf4j.Slf4jLogger).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
14/04/14 22:17:03 INFO Master: Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
14/04/14 22:17:03 INFO Master: Starting Spark master at spark://10.11.52.223:7077
14/04/14 22:17:03 INFO MasterWebUI: Started Master web UI at http://tc-52-223:8080
14/04/14 22:17:03 INFO Master: I have been elected leader! New state: ALIVE
14/04/14 22:17:06 INFO Master: Registering worker tc-52-223:20599 with 1 cores, 4.0 GB RAM
14/04/14 22:17:06 INFO Master: Registering worker tc_52_224:21371 with 1 cores, 4.0 GB RAM
14/04/14 22:17:06 INFO RemoteActorRefProvider$RemoteDeadLetterActorRef: Message [org.apache.spark.deploy.DeployMessages$RegisteredWorker] from Actor[akka://sparkMaster/user/Master#1972530850] to Actor[akka://sparkMaster/deadLetters] was not delivered. [1] dead letters encountered. This logging can be turned off or adjusted with configuration settings 'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'.
14/04/14 22:17:26 INFO Master: Registering worker tc_52_224:21371 with 1 cores, 4.0 GB RAM
14/04/14 22:17:26 INFO RemoteActorRefProvider$RemoteDeadLetterActorRef: Message [org.apache.spark.deploy.DeployMessages$RegisterWorkerFailed] from Actor[akka://sparkMaster/user/Master#1972530850] to Actor[akka://sparkMaster/deadLetters] was not delivered. [2] dead letters encountered. This logging can be turned off or adjusted with configuration settings 'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'.
14/04/14 22:17:46 INFO Master: Registering worker tc_52_224:21371 with 1 cores, 4.0 GB RAM
14/04/14 22:17:46 INFO RemoteActorRefProvider$RemoteDeadLetterActorRef: Message [org.apache.spark.deploy.DeployMessages$RegisterWorkerFailed] from Actor[akka://sparkMaster/user/Master#1972530850] to Actor[akka://sparkMaster/deadLetters] was not delivered. [3] dead letters encountered. This logging can be turned off or adjusted with configuration settings 'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'.
14/04/14 22:18:06 INFO Master: akka.tcp://sparkWorker@tc_52_224:21371 got disassociated, removing it.
14/04/14 22:18:06 INFO Master: akka.tcp://sparkWorker@tc_52_224:21371 got disassociated, removing it.
14/04/14 22:18:06 INFO LocalActorRef: Message [akka.remote.transport.ActorTransportAdapter$DisassociateUnderlying] from Actor[akka://sparkMaster/deadLetters] to Actor[akka://sparkMaster/system/transports/akkaprotocolmanager.tcp0/akkaProtocol-tcp%3A%2F%2FsparkMaster%4010.11.52.224%3A61550-1#646150938] was not delivered. [4] dead letters encountered. This logging can be turned off or adjusted with configuration settings 'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'.
14/04/14 22:18:06 INFO Master: akka.tcp://sparkWorker@tc_52_224:21371 got disassociated, removing it.
14/04/14 22:18:06 ERROR EndpointWriter: AssociationError [akka.tcp://sparkMaster@10.11.52.223:7077] -> [akka.tcp://sparkWorker@tc_52_224:21371]: Error [Association failed with [akka.tcp://sparkWorker@tc_52_224:21371]] [
akka.remote.EndpointAssociationException: Association failed with [akka.tcp://sparkWorker@tc_52_224:21371]
Caused by: akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2: Connection refused: tc_52_224/10.11.52.224:21371
]
14/04/14 22:18:06 INFO Master: akka.tcp://sparkWorker@tc_52_224:21371 got disassociated, removing it.
14/04/14 22:18:06 ERROR EndpointWriter: AssociationError [akka.tcp://sparkMaster@10.11.52.223:7077] -> [akka.tcp://sparkWorker@tc_52_224:21371]: Error [Association failed with [akka.tcp://sparkWorker@tc_52_224:21371]] [
akka.remote.EndpointAssociationException: Association failed with [akka.tcp://sparkWorker@tc_52_224:21371]
Caused by: akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2: Connection refused: tc_52_224/10.11.52.224:21371
]
14/04/14 22:18:06 ERROR EndpointWriter: AssociationError [akka.tcp://sparkMaster@10.11.52.223:7077] -> [akka.tcp://sparkWorker@tc_52_224:21371]: Error [Association failed with [akka.tcp://sparkWorker@tc_52_224:21371]] [
akka.remote.EndpointAssociationException: Association failed with [akka.tcp://sparkWorker@tc_52_224:21371]
Caused by: akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2: Connection refused: tc_52_224/10.11.52.224:21371
]
14/04/14 22:18:06 INFO Master: akka.tcp://sparkWorker@tc_52_224:21371 got disassociated, removing it.
14/04/14 22:19:03 WARN Master: Removing worker-20140414221705-tc_52_224-21371 because we got no heartbeat in 60 seconds
14/04/14 22:19:03 INFO Master: Removing worker worker-20140414221705-tc_52_224-21371 on tc_52_224:21371
Run Code Online (Sandbox Code Playgroud)
接下来是223(工人)的工人日志:
14/04/14 22:17:06 INFO Worker: Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
14/04/14 22:17:06 INFO Worker: Starting Spark worker tc-52-223:20599 with 1 cores, 4.0 GB RAM
14/04/14 22:17:06 INFO Worker: Spark home: /data/test/spark-0.9.1-bin-hadoop1
14/04/14 22:17:06 INFO WorkerWebUI: Started Worker web UI at http://tc-52-223:8081
14/04/14 22:17:06 INFO Worker: Connecting to master spark://xx.xx.52.223:7077...
14/04/14 22:17:06 INFO Worker: Successfully registered with master spark://xx.xx.52.223:7077
Run Code Online (Sandbox Code Playgroud)
接下来是224(工人)的工作日志:
Spark Command: /usr/local/jdk/bin/java -cp :/data/test/spark-0.9.1-bin-hadoop1/conf:/data/test/spark-0.9.1-bin-hadoop1/assembly/target/scala-2.10/spark-assembly_2.10-0.9.1-hadoop1.0.4.jar -Dspark.akka.logLifecycleEvents=true -Djava.library.path= -Xms512m -Xmx512m org.apache.spark.deploy.worker.Worker spark://10.11.52.223:7077 --webui-port 8081
========================================
log4j:WARN No appenders could be found for logger (akka.event.slf4j.Slf4jLogger).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
14/04/14 22:17:06 INFO Worker: Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
14/04/14 22:17:06 INFO Worker: Starting Spark worker tc_52_224:21371 with 1 cores, 4.0 GB RAM
14/04/14 22:17:06 INFO Worker: Spark home: /data/test/spark-0.9.1-bin-hadoop1
14/04/14 22:17:06 INFO WorkerWebUI: Started Worker web UI at http://tc_52_224:8081
14/04/14 22:17:06 INFO Worker: Connecting to master spark://xx.xx.52.223:7077...
14/04/14 22:17:26 INFO Worker: Connecting to master spark://xx.xx.52.223:7077...
14/04/14 22:17:46 INFO Worker: Connecting to master spark://xx.xx.52.223:7077...
14/04/14 22:18:06 ERROR Worker: All masters are unresponsive! Giving up.
Run Code Online (Sandbox Code Playgroud)
接下来是我的spark-env.sh:
JAVA_HOME=/usr/local/jdk
export SPARK_MASTER_IP=tc-52-223
export SPARK_WORKER_CORES=1
export SPARK_WORKER_INSTANCES=1
export SPARK_MASTER_PORT=7077
export SPARK_WORKER_MEMORY=4g
export MASTER=spark://${SPARK_MASTER_IP}:${SPARK_MASTER_PORT}
export SPARK_LOCAL_IP=tc-52-223
Run Code Online (Sandbox Code Playgroud)
我搜索了许多解决方案,但他们无法工作.请帮我.
我不确定这是否与我遇到的问题相同,但你可能想尝试设置SPARK_MASTER_IP与spark绑定的相同.在你的例子中看起来会是这样10.11.52.223而不是tc-52-223.
它应该与您在8080上访问主节点Web UI时看到的内容相同.如下所示: Spark Master at spark://ec2-XX-XX-XXX-XXX.compute-1.amazonaws.com:7077
如果您收到"连接被拒绝"例外,您可以通过检查来解决它
=> Master正在特定主机上运行
netstat -at | grep 7077
Run Code Online (Sandbox Code Playgroud)
你会得到类似的东西:
tcp 0 0 akhldz.master.io:7077 *:* LISTEN
Run Code Online (Sandbox Code Playgroud)
如果是这样的话,那么从你的工人机做主机akhldz.master.io(与你的主人替换akhldz.master.io host.If不顺心的事,那么在你的/ etc/hosts文件添加主机条目)
telnet akhldz.master.io 7077(如果没有连接,那么你的工作人员也不会连接.)
=>在/ etc/hosts中添加主机条目
从您的工作机打开/ etc/hosts并添加以下条目(示例)
192.168.100.20 akhldz.master.io
Run Code Online (Sandbox Code Playgroud)
PS:在上面的例子中,Pillis有两个具有相同主机名的IP地址,例如:
192.168.100.40 s1.machine.org
192.168.100.41 s1.machine.org
Run Code Online (Sandbox Code Playgroud)
希望有所帮助.
| 归档时间: |
|
| 查看次数: |
25645 次 |
| 最近记录: |