什么会导致Zookeeper客户端会话超时

gzc*_*gzc 8 apache-storm apache-zookeeper

我部署了长时间运行的Storm拓扑。运行了几个小时后,整个拓扑崩溃了。我检查了工作日志,并找到了这些日志。如此说来,zookeeper客户端会话超时并导致重新连接。我怀疑这与我破碎的拓扑有关。现在,我尝试找出导致客户端超时的原因。

2016-02-29T10:34:12.386+0800 o.a.s.z.ClientCnxn [INFO] Client session timed out, have not heard from server in 23789ms for sessionid 0x252f862028c0083, closing socket connection and attempting reconnect
2016-02-29T10:34:12.986+0800 o.a.s.c.f.s.ConnectionStateManager [INFO] State change: SUSPENDED
2016-02-29T10:34:13.059+0800 b.s.cluster [WARN] Received event :disconnected::none: with disconnected Zookeeper.
2016-02-29T10:34:13.197+0800 o.a.s.z.ClientCnxn [INFO] Opening socket connection to server zk-3.cloud.mos/172.16.13.147:2181. Will not attempt to authenticate using SASL (unknown error)
2016-02-29T10:34:13.241+0800 o.a.s.z.ClientCnxn [WARN] Session 0x252f862028c0083 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) ~[na:1.8.0_31]
    at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:716) ~[na:1.8.0_31]
    at org.apache.storm.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361) ~[storm-core-0.9.6.jar:0.9.6]
    at org.apache.storm.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081) ~[storm-core-0.9.6.jar:0.9.6]
Run Code Online (Sandbox Code Playgroud)

Mar*_*ano 5

您的客户端无法再与 ZooKeeper 服务器通信。发生的第一件事是在协商的会话超时内没有心跳响应:

2016-02-29T10:34:12.386+0800 oaszClientCnxn [INFO] 客户端会话超时,在 23789 毫秒内没有收到来自服务器的 sessionid 0x252f862028c0083,关闭套接字连接并尝试重新连接

然后当它尝试重新连接时,连接被拒绝:

2016-02-29T10:34:13.241+0800 oaszClientCnxn [警告] 会话 0x252f862028c0083 服务器为空,意外错误,关闭套接字连接并尝试重新连接 java.net.ConnectException:连接被拒绝

这意味着您的 ZooKeeper 服务器:

  • 无法访问(网络连接中断)
  • 已死(因此没有在套接字上侦听)
  • GC 自己死了并且无法通信(尽管这可能会发出连接超时错误,我不确定)

要了解更多信息,您需要检查(Hadoop?)集群上的 ZooKeeper 服务器日志。

  • 请问我也面临着同样的问题,如果我有GC问题,我该如何解决呢? (3认同)