如何从 HA 集群以及 etcd 集群中删除主节点

Tha*_*nos 5 etcd kubernetes

我是 k8s 新手,我发现了一个无法解决的问题。

我正在构建主节点的 HA 集群。我正在运行一些测试(删除一个节点并再次添加该节点)。通过这个过程我注意到etcd集群没有更新集群列表。

问题示例如下:

$ kubectl get pods -A
NAMESPACE                NAME                                                 READY   STATUS    RESTARTS   AGE
cri-o-metrics-exporter   cri-o-metrics-exporter-77c9cf9746-qlp4d              0/1     Pending   0          16h
haproxy-controller       haproxy-ingress-769d858699-b8r8q                     0/1     Pending   0          16h
haproxy-controller       ingress-default-backend-5fd4986454-kvbw8             0/1     Pending   0          16h
kube-system              calico-kube-controllers-574d679d8c-tkcjj             1/1     Running   3          16h
kube-system              calico-node-95t6l                                    1/1     Running   2          16h
kube-system              calico-node-m5txs                                    1/1     Running   2          16h
kube-system              coredns-7588b55795-gkfjq                             1/1     Running   2          16h
kube-system              coredns-7588b55795-lxpmj                             1/1     Running   2          16h
kube-system              etcd-masterNode1                                     1/1     Running   2          16h
kube-system              etcd-masterNode2                                     1/1     Running   2          16h
kube-system              kube-apiserver-masterNode1                           1/1     Running   3          16h
kube-system              kube-apiserver-masterNode2                           1/1     Running   3          16h
kube-system              kube-controller-manager-masterNode1                  1/1     Running   4          16h
kube-system              kube-controller-manager-masterNode2                  1/1     Running   4          16h
kube-system              kube-proxy-5q6xs                                     1/1     Running   2          16h
kube-system              kube-proxy-k8p6h                                     1/1     Running   2          16h
kube-system              kube-scheduler-masterNode1                           1/1     Running   3          16h
kube-system              kube-scheduler-masterNode2                           1/1     Running   6          16h
kube-system              metrics-server-575bd7f776-jtfsh                      0/1     Pending   0          16h
kubernetes-dashboard     dashboard-metrics-scraper-6f78bc588b-khjjr           1/1     Running   2          16h
kubernetes-dashboard     kubernetes-dashboard-978555c5b-9jsxb                 1/1     Running   2          16h
$ kubectl exec etcd-masterNode2 -n kube-system -it -- sh
sh-5.0# etcdctl --cacert /etc/kubernetes/pki/etcd/ca.crt --cert /etc/kubernetes/pki/etcd/peer.crt --key /etc/kubernetes/pki/etcd/peer.key member list -w table
+------------------+---------+----------------------------+---------------------------+---------------------------+------------+
|        ID        | STATUS  |            NAME            |        PEER ADDRS         |       CLIENT ADDRS        | IS LEARNER |
+------------------+---------+----------------------------+---------------------------+---------------------------+------------+
| 4c209e5bc1ca9593 | started |         masterNode1        |     https://IP1:2380      |     https://IP1:2379      |      false |
| 676d4bfab319fa22 | started |         masterNode2        |     https://IP2:2380      |     https://IP2:2379      |      false |
| a9af4b00e33f87d4 | started |         masterNode3        |     https://IP3:2380      |     https://IP3:2379      |      false |
+------------------+---------+----------------------------+---------------------------+---------------------------+------------+
sh-5.0# exit
$ kubectl get nodes
NAME                         STATUS   ROLES    AGE   VERSION
masterNode1                  Ready    master   16h   v1.19.0
masterNode2                  Ready    master   16h   v1.19.0
Run Code Online (Sandbox Code Playgroud)

我假设我正在从集群中正确删除节点。我正在遵循的程序:

  1. kubectl 排水 --ignore-daemonsets --delete-local-data
  2. kubectl 删除
  3. 节点 kubeadm 重置
  4. rm -f /etc/cni/net.d/* # 删除 CNI 配置
  5. rm -rf /var/lib/kubelet # 删除 /var/lib/kubeler 目录
  6. rm -rf /var/lib/etcd # 删除 /var/lib/etcd
  7. iptables -F && iptables -t nat -F && iptables -t mangle -F && iptables -X && iptables -t filter -F && iptables -t filter -X # 删除 iptables
  8. ipvsadm --清除
  9. rm -rf /etc/kubernetes # 删除 /etc/kubernetes (以防字符更改)

1.19.0我正在使用 version和 etcd运行 kubernetes etcd:3.4.9-1

集群运行在裸机节点上。

这是一个错误还是我没有从 etcd 集群中正确删除节点?

Tha*_*nos 14

感谢Mariusz K。我找到了问题的答案。如果其他人可能遇到同样的问题,这里是我的解决方法。

首先查询集群 (HA) 中的 etcd 成员(代码示例):

$ kubectl exec etcd-< nodeNameMasterNode > -n kube-system -- etcdctl --cacert /etc/kubernetes/pki/etcd/ca.crt --cert /etc/kubernetes/pki/etcd/peer.crt --key /etc/kubernetes/pki/etcd/peer.key member list
1863b58e85c8a808, started, nodeNameMaster1, https://IP1:2380, https://IP1:2379, false
676d4bfab319fa22, started, nodeNameMaster2, https://IP2:2380, https://IP2:2379, false
b0c50c50d563ed51, started, nodeNameMaster3, https://IP3:2380, https://IP3:2379, false
Run Code Online (Sandbox Code Playgroud)

然后,一旦获得节点列表,您就可以删除所需的任何成员。代码示例:

kubectl exec etcd-nodeNameMaster1 -n kube-system -- etcdctl --cacert /etc/kubernetes/pki/etcd/ca.crt --cert /etc/kubernetes/pki/etcd/peer.crt --key /etc/kubernetes/pki/etcd/peer.key member remove b0c50c50d563ed51
Member b0c50c50d563ed51 removed from cluster d1e1de99e3d19634
Run Code Online (Sandbox Code Playgroud)

我希望能够从 etcd 集群中删除成员,而无需连接到 pod 并运行辅助命令。这样我就通过exec向pod执行命令。

  • @ImranRazaKhan 看来,只有当相关节点仍然启动并运行时,这才有效。如果它失败并且不可用,那么答案中所说的似乎是可行的方法。 (3认同)
  • 如果您只想从 etcd 中删除,则可以使用以下“kubeadm重置阶段remove-etcd-member” (2认同)