选举新的动物园管理员领导关闭了Spark Master

Ara*_*rad 12 apache-spark apache-zookeeper

当我杀死领导者动物园管理员时,我意识到主火花变得反应迟钝(当然我把领导者选举任务分配给了动物园管理员).以下是我在Master Spark节点上看到的错误日志.你有什么建议可以解决吗?

15/06/22 10:44:00 INFO ClientCnxn: Unable to read additional data from
> server sessionid 0x14dd82e22f70ef1, likely server has closed socket,
> closing socket connection and attempting reconnect 

15/06/22 10:44:00
> INFO ClientCnxn: Unable to read additional data from server sessionid
> 0x24dc5a319b40090, likely server has closed socket, closing socket
> connection and attempting reconnect 

15/06/22 10:44:01 INFO
> ConnectionStateManager: State change: SUSPENDED 

15/06/22 10:44:01 INFO
> ConnectionStateManager: State change: SUSPENDED 

15/06/22 10:44:01 WARN
> ConnectionStateManager: There are no ConnectionStateListeners
> registered. 

15/06/22 10:44:01 INFO ZooKeeperLeaderElectionAgent: We
> have lost leadership 

15/06/22 10:44:01 ERROR Master: Leadership has
> been revoked -- master shutting down.
Run Code Online (Sandbox Code Playgroud)

Kni*_*t71 2

这是预期的行为。您必须设置“n”个master,并且需要在所有master env.sh中指定zookeeper url

SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER -Dspark.deploy.zookeeper.url=zk1:2181,zk2:2181"
Run Code Online (Sandbox Code Playgroud)

请注意,zookeeper 维持法定人数。这意味着您需要有奇数个 Zookeeper,并且只有在维持法定人数时,Zookeeper 集群才会启动。由于 Spark 依赖于 Zookeeper,这意味着 Spark 集群在 Zookeeper 仲裁得到维持之前不会启动。

当您设置两个(n)主节点并关闭动物园管理员时,当前主节点将关闭并选举新主节点,并且所有工作节点将附加到新主节点。

你应该通过给予来开始你的工人

./start-slave.sh spark://master1:port1,master2:port2
Run Code Online (Sandbox Code Playgroud)

一定要等1-2分钟!!以注意到此故障转移。