dav*_*ear 4 cluster-computing etcd
嘿,由于某种原因,我的集群 ID 不匹配,我将它放在 1 个节点上,然后在多次清除数据目录、更改集群令牌和节点名称后消失,但在另一个节点上出现
这是我使用的脚本
IP0=10.150.0.1
IP1=10.150.0.2
IP2=10.150.0.3
IP3=10.150.0.4
NODENAME0=node0
NODENAME1=node1
NODENAME2=node2
NODENAME3=node3
# changing these on each box
THISIP=$IP2
THISNODENAME=$NODENAME2
etcd --name $THISNODENAME --initial-advertise-peer-urls http://$THISIP:2380 \
--data-dir /root/etcd-data \
--listen-peer-urls http://$THISIP:2380 \
--listen-client-urls http://$THISIP:2379,http://127.0.0.1:2379 \
--advertise-client-urls http://$THISIP:2379 \
--initial-cluster-token etcd-cluster-2 \
--initial-cluster $NODENAME0=http://$IP0:2380,$NODENAME1=http://$IP1:2380,$NODENAME2=http://$IP2:2380,$NODENAME3=http://$IP3:2380 \
--initial-cluster-state new
Run Code Online (Sandbox Code Playgroud)
我得到
2016-11-11 22:13:12.090515 I | etcdmain: etcd Version: 2.3.7
2016-11-11 22:13:12.090643 N | etcdmain: the server is already initialized as member before, starting as etcd member...
2016-11-11 22:13:12.090713 I | etcdmain: listening for peers on http://10.150.0.3:2380
2016-11-11 22:13:12.090745 I | etcdmain: listening for client requests on http://10.150.0.3:2379
2016-11-11 22:13:12.090771 I | etcdmain: listening for client requests on http://127.0.0.1:2379
2016-11-11 22:13:12.090960 I | etcdserver: name = node2
2016-11-11 22:13:12.090976 I | etcdserver: data dir = /root/etcd-data
2016-11-11 22:13:12.090983 I | etcdserver: member dir = /root/etcd-data/member
2016-11-11 22:13:12.090990 I | etcdserver: heartbeat = 100ms
2016-11-11 22:13:12.090995 I | etcdserver: election = 1000ms
2016-11-11 22:13:12.091001 I | etcdserver: snapshot count = 10000
2016-11-11 22:13:12.091011 I | etcdserver: advertise client URLs = http://10.150.0.3:2379
2016-11-11 22:13:12.091269 I | etcdserver: restarting member 7fbd572038b372f6 in cluster 4e73d7b9b94fe83b at commit index 4
2016-11-11 22:13:12.091317 I | raft: 7fbd572038b372f6 became follower at term 8
2016-11-11 22:13:12.091346 I | raft: newRaft 7fbd572038b372f6 [peers: [], term: 8, commit: 4, applied: 0, lastindex: 4, lastterm: 1]
2016-11-11 22:13:12.091516 I | etcdserver: starting server... [version: 2.3.7, cluster version: to_be_decided]
2016-11-11 22:13:12.091869 E | etcdmain: failed to notify systemd for readiness: No socket
2016-11-11 22:13:12.091894 E | etcdmain: forgot to set Type=notify in systemd service file?
2016-11-11 22:13:12.096380 N | etcdserver: added member 7508b3e625cfed5 [http://10.150.0.4:2380] to cluster 4e73d7b9b94fe83b
2016-11-11 22:13:12.099800 N | etcdserver: added member 14c76eb5d27acbc5 [http://10.150.0.1:2380] to cluster 4e73d7b9b94fe83b
2016-11-11 22:13:12.100957 N | etcdserver: added local member 7fbd572038b372f6 [http://10.150.0.2:2380] to cluster 4e73d7b9b94fe83b
2016-11-11 22:13:12.102711 N | etcdserver: added member d416fca114f17871 [http://10.150.0.3:2380] to cluster 4e73d7b9b94fe83b
2016-11-11 22:13:12.134330 E | rafthttp: request cluster ID mismatch (got cfd5ef74b3dcf6fe want 4e73d7b9b94fe83b)
Run Code Online (Sandbox Code Playgroud)
其他成员甚至都没有跑步,这怎么可能?
谢谢
对于所有从谷歌偶然发现这一点的人:
错误是关于对等成员 ID,它试图加入与集群中已经存在的另一个成员(可能是旧实例)同名的集群(具有相同的对等名称,但另一个 ID,这就是问题所在)。
您应该删除对等点并重新添加它,如这篇有用的帖子所示:
为了解决这个问题,它非常简单,首先我们必须登录集群其余部分上现有的工作服务器,并从其成员列表中删除 server00:
etcdctl member remove <UID>
这释放了允许新 server00 加入的能力,但我们需要通过发出 add 命令简单地告诉集群它可以加入:
etcdctl member add server00 http://1.2.3.4:2380
如果您遵循 server00 上的日志,您将看到一切都变得生动起来。您可以使用以下命令确认这一点:
etcdctl member list
etcdctl cluster-health
使用“etcdctl member list”查找当前成员的ID是什么,并找到试图加入错误ID的集群,然后使用“etcdctl member remove”从“members”中删除该对等点并尝试重新加入他。希望能帮助到你。
就我而言,我收到了错误
rafthttp:请求集群 ID 不匹配(得到 1b3a88599e79f82b 想要 b33939d80a381a57)
由于一个节点上的配置不正确
我的两个节点已进入配置
env ETCD_INITIAL_CLUSTER="etcd-01=http: //172.16.50.101 :2380,etcd-02=http://172.16.50.102:2380,etcd-03=http://172.16.50.103:2380 "
并且一个节点得到了
环境 ETCD_INITIAL_CLUSTER="etcd-01= http://172.16.50.101:2380 "
为了解决这个问题,我在所有节点上停止了 etcd,编辑了不正确的配置,删除了所有节点中的 /var/lib/etcd/member 文件夹,在所有节点上重新启动了 etcd,瞧!
附注
/var/lib/etcd - 在我的例子中是 etcd 保存数据的文件夹