Hos*_*ari 5 calico kubernetes kubespray
我使用 kubespray 部署了一个全新的 k8s 集群,一切正常,但所有与 calico 相关的 pod 尚未准备好。经过几个小时的调试,我找不到 calico pod 崩溃的原因。我什至禁用/停止了整个防火墙服务,但没有任何改变。
另一件重要的事情是calicoctl node status输出不稳定,每次调用都会显示不同的内容:
Calico process is not running.
Run Code Online (Sandbox Code Playgroud)
Calico process is running.
None of the BGP backend processes (BIRD or GoBGP) are running.
Run Code Online (Sandbox Code Playgroud)
Calico process is running.
IPv4 BGP status
+----------------+-------------------+-------+----------+---------+
| PEER ADDRESS | PEER TYPE | STATE | SINCE | INFO |
+----------------+-------------------+-------+----------+---------+
| 192.168.231.42 | node-to-node mesh | start | 06:23:41 | Passive |
+----------------+-------------------+-------+----------+---------+
IPv6 BGP status
No IPv6 peers found.
Run Code Online (Sandbox Code Playgroud)
另一个经常出现的日志是以下消息:
bird: Unable to open configuration file /etc/calico/confd/config/bird.cfg: No such file or directory
bird: Unable to open configuration file /etc/calico/confd/config/bird6.cfg: No such file or directory
Run Code Online (Sandbox Code Playgroud)
还尝试使用以下各项更改 IP_AUTODETECTION_METHOD 但没有任何改变:
kubectl set env daemonset/calico-node -n kube-system IP_AUTODETECTION_METHOD=can-reach=www.google.com
kubectl set env daemonset/calico-node -n kube-system IP_AUTODETECTION_METHOD=can-reach=8.8.8.8
kubectl set env daemonset/calico-node -n kube-system IP_AUTODETECTION_METHOD=interface=eth1
kubectl set env daemonset/calico-node -n kube-system IP_AUTODETECTION_METHOD=interface=eth.*
Run Code Online (Sandbox Code Playgroud)
与 calico 相关的所有 pod、daemonset、deployment 和副本集都应处于 READY 状态。
与 calico 相关的所有 pod、daemonset、deployment 和副本集均处于 NOT READY 状态。
还没有,我正在寻求有关如何调试/克服这个问题的帮助。
它是 kubespray 的最新版本,具有以下上下文和环境。
git reflog
7e4b176 HEAD@{0}: clone: from https://github.com/kubernetes-sigs/kubespray.git
Run Code Online (Sandbox Code Playgroud)
我正在尝试部署一个 k8s 集群,该集群有一个主节点和一个工作节点。另请注意,参与该集群的服务器位于几乎气隙/离线环境中,对全球互联网的访问受到限制,当然使用 kubespray 部署集群的 ansible 过程是成功的,但我在 calico pod 中遇到了这个问题。
cat inventory/mycluster/hosts.yaml
all:
hosts:
node1:
ansible_host: 192.168.231.41
ansible_port: 32244
ip: 192.168.231.41
access_ip: 192.168.231.41
node2:
ansible_host: 192.168.231.42
ansible_port: 32244
ip: 192.168.231.42
access_ip: 192.168.231.42
children:
kube_control_plane:
hosts:
node1:
kube_node:
hosts:
node1:
node2:
etcd:
hosts:
node1:
k8s_cluster:
children:
kube_control_plane:
kube_node:
calico_rr:
hosts: {}
Run Code Online (Sandbox Code Playgroud)
calicoctl version
Client Version: v3.19.2
Git commit: 6f3d4900
Cluster Version: v3.19.2
Cluster Type: kubespray,bgp,kubeadm,kdd,k8s
Run Code Online (Sandbox Code Playgroud)
cat /etc/centos-release
CentOS Linux release 7.9.2009 (Core)
Run Code Online (Sandbox Code Playgroud)
uname -r
3.10.0-1160.42.2.el7.x86_64
Run Code Online (Sandbox Code Playgroud)
kubectl version
Client Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.4", GitCommit:"3cce4a82b44f032d0cd1a1790e6d2f5a55d20aae", GitTreeState:"clean", BuildDate:"2021-08-11T18:16:05Z", GoVersion:"go1.16.7", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.4", GitCommit:"3cce4a82b44f032d0cd1a1790e6d2f5a55d20aae", GitTreeState:"clean", BuildDate:"2021-08-11T18:10:22Z", GoVersion:"go1.16.7", Compiler:"gc", Platform:"linux/amd64"}
Run Code Online (Sandbox Code Playgroud)
kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
node1 Ready control-plane,master 19h v1.21.4 192.168.231.41 <none> CentOS Linux 7 (Core) 3.10.0-1160.42.2.el7.x86_64 docker://20.10.8
node2 Ready <none> 19h v1.21.4 192.168.231.42 <none> CentOS Linux 7 (Core) 3.10.0-1160.42.2.el7.x86_64 docker://20.10.8
Run Code Online (Sandbox Code Playgroud)
kubectl get all --all-namespaces -o wide
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
kube-system pod/calico-kube-controllers-8575b76f66-57zw4 0/1 CrashLoopBackOff 327 19h 192.168.231.42 node2 <none> <none>
kube-system pod/calico-node-4hkzb 0/1 Running 245 14h 192.168.231.42 node2 <none> <none>
kube-system pod/calico-node-hznhc 0/1 Running 245 14h 192.168.231.41 node1 <none> <none>
kube-system pod/coredns-8474476ff8-b6lqz 1/1 Running 0 19h 10.233.96.1 node2 <none> <none>
kube-system pod/coredns-8474476ff8-gdkml 1/1 Running 0 19h 10.233.90.1 node1 <none> <none>
kube-system pod/dns-autoscaler-7df78bfcfb-xnn4r 1/1 Running 0 19h 10.233.90.2 node1 <none> <none>
kube-system pod/kube-apiserver-node1 1/1 Running 0 19h 192.168.231.41 node1 <none> <none>
kube-system pod/kube-controller-manager-node1 1/1 Running 0 19h 192.168.231.41 node1 <none> <none>
kube-system pod/kube-proxy-dmw22 1/1 Running 0 19h 192.168.231.41 node1 <none> <none>
kube-system pod/kube-proxy-wzpnv 1/1 Running 0 19h 192.168.231.42 node2 <none> <none>
kube-system pod/kube-scheduler-node1 1/1 Running 0 19h 192.168.231.41 node1 <none> <none>
kube-system pod/nginx-proxy-node2 1/1 Running 0 19h 192.168.231.42 node2 <none> <none>
kube-system pod/nodelocaldns-6h5q2 1/1 Running 0 19h 192.168.231.42 node2 <none> <none>
kube-system pod/nodelocaldns-7fwbd 1/1 Running 0 19h 192.168.231.41 node1 <none> <none>
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
default service/kubernetes ClusterIP 10.233.0.1 <none> 443/TCP 19h <none>
kube-system service/coredns ClusterIP 10.233.0.3 <none> 53/UDP,53/TCP,9153/TCP 19h k8s-app=kube-dns
NAMESPACE NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE CONTAINERS IMAGES SELECTOR
kube-system daemonset.apps/calico-node 2 2 0 2 0 kubernetes.io/os=linux 19h calico-node quay.io/calico/node:v3.19.2 k8s-app=calico-node
kube-system daemonset.apps/kube-proxy 2 2 2 2 2 kubernetes.io/os=linux 19h kube-proxy k8s.gcr.io/kube-proxy:v1.21.4 k8s-app=kube-proxy
kube-system daemonset.apps/nodelocaldns 2 2 2 2 2 kubernetes.io/os=linux 19h node-cache k8s.gcr.io/dns/k8s-dns-node-cache:1.17.1 k8s-app=nodelocaldns
NAMESPACE NAME READY UP-TO-DATE AVAILABLE AGE CONTAINERS IMAGES SELECTOR
kube-system deployment.apps/calico-kube-controllers 0/1 1 0 19h calico-kube-controllers quay.io/calico/kube-controllers:v3.19.2 k8s-app=calico-kube-controllers
kube-system deployment.apps/coredns 2/2 2 2 19h coredns k8s.gcr.io/coredns/coredns:v1.8.0 k8s-app=kube-dns
kube-system deployment.apps/dns-autoscaler 1/1 1 1 19h autoscaler k8s.gcr.io/cpa/cluster-proportional-autoscaler-amd64:1.8.3 k8s-app=dns-autoscaler
NAMESPACE NAME DESIRED CURRENT READY AGE CONTAINERS IMAGES SELECTOR
kube-system replicaset.apps/calico-kube-controllers-8575b76f66 1 1 0 19h calico-kube-controllers quay.io/calico/kube-controllers:v3.19.2 k8s-app=calico-kube-controllers,pod-template-hash=8575b76f66
kube-system replicaset.apps/coredns-8474476ff8 2 2 2 19h coredns k8s.gcr.io/coredns/coredns:v1.8.0 k8s-app=kube-dns,pod-template-hash=8474476ff8
kube-system replicaset.apps/dns-autoscaler-7df78bfcfb 1 1 1 19h autoscaler k8s.gcr.io/cpa/cluster-proportional-autoscaler-amd64:1.8.3 k8s-app=dns-autoscaler,pod-template-hash=7df78bfcfb
Run Code Online (Sandbox Code Playgroud)
幸运的是,timeoutSeconds将livenessProbe&readinessProbe从1增加到60解决了这个问题。
kubectl edit -n kube-system daemonset.apps/calico-node
kubectl edit -n kube-system deployment.apps/calico-kube-controllers
Run Code Online (Sandbox Code Playgroud)
https://github.com/projectcalico/calico/issues/4935
| 归档时间: |
|
| 查看次数: |
11364 次 |
| 最近记录: |