Kubernetes:Pod IP 地址超出 --pod-network-cidr 中指定的范围

The*_*DHM 4 networking kubernetes kubeadm kubectl

升级到 v1.24.0 后(删除 Dockershim 后),我必须安装cri-dockerd,然后我执行了以下操作:

\n
sudo kubeadm init --pod-network-cidr=10.244.0.0/16 --cri-socket=unix:///var/run/cri-dockerd.sock --apiserver-advertise-address=192.168.0.196\n
Run Code Online (Sandbox Code Playgroud)\n

我选择flannel作为网络插件:

\n
kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml\n
Run Code Online (Sandbox Code Playgroud)\n

到目前为止,一切都按预期进行,但在主节点上启用调度、加入工作节点并部署我的 Pod 和服务后,我注意到一个奇怪的网络问题,即NodePortClusterIP服务在节点之间无法工作(使用一个节点时没有问题) 。

\n

后来我发现 pod 是从docker 网络( 172.17.0.*) 获取 IP 地址,而不是从--pod-network-cidr=10.244.0.0/16:

\n
masterzulu@master-zulu:~$ kubectl get pods --all-namespaces -o wide\nNAMESPACE      NAME                                  READY   STATUS    RESTARTS   AGE     IP              NODE          \n\ndjango-space   django-588cb669d4-46b4w               1/1     Running   0          3m35s   172.17.0.4      master-zulu\ndjango-space   postgres-deployment-b58d5ff94-hs7t4   1/1     Running   0          3m35s   172.17.0.5      master-zulu\nkube-system    coredns-6d4b75cb6d-8gw6c              1/1     Running   0          7m9s    172.17.0.2      master-zulu\nkube-system    coredns-6d4b75cb6d-nxlq9              1/1     Running   0          7m9s    172.17.0.3      master-zulu\n
Run Code Online (Sandbox Code Playgroud)\n

flannel DaemonSet 正在运行:

\n
kube-system    kube-flannel-ds-tqgvk                 1/1     Running   0          5m51s   192.168.3.132   master-zulu\n
Run Code Online (Sandbox Code Playgroud)\n

并设置 podCIDR:

\n
masterzulu@master-zulu:~$ kubectl get no master-zulu -o json | jq \'.spec.podCIDR\'\n"10.244.0.0/24"\n
Run Code Online (Sandbox Code Playgroud)\n

我尝试将该--network-plugin=cni标志添加到 kubelet 启动配置中,但出现错误,因为该标志与 v1.24.0 中的 dockershim 和其他标志一起被删除。

\n

这是cri-docker的状态:

\n
\xe2\x97\x8f cri-docker.service - CRI Interface for Docker Application Container Engine\n     Loaded: loaded (/etc/systemd/system/cri-docker.service; enabled; vendor preset: enabled)\n     Active: active (running) since Wed 2022-05-25 21:36:57 BST; 5h 34min ago\nTriggeredBy: \xe2\x97\x8f cri-docker.socket\n       Docs: https://docs.mirantis.com\n   Main PID: 1098 (cri-dockerd)\n      Tasks: 15\n     Memory: 53.4M\n     CGroup: /system.slice/cri-docker.service\n             \xe2\x94\x94\xe2\x94\x801098 /usr/local/bin/cri-dockerd --container-runtime-endpoint fd:// --network-plugin=\n\nMay 26 01:51:56 master-zulu cri-dockerd[1098]: time="2022-05-26T01:51:56+01:00" level=info msg="Failed to read pod IP from plugin/docker: Couldn\'t find network status for kube-system/coredns-6d4b75cb6d-nxlq9 through plugin: invalid network status for"\nMay 26 01:51:56 master-zulu cri-dockerd[1098]: time="2022-05-26T01:51:56+01:00" level=info msg="Failed to read pod IP from plugin/docker: Couldn\'t find network status for kube-system/coredns-6d4b75cb6d-nxlq9 through plugin: invalid network status for"\nMay 26 01:51:56 master-zulu cri-dockerd[1098]: time="2022-05-26T01:51:56+01:00" level=info msg="Failed to read pod IP from plugin/docker: Couldn\'t find network status for kube-system/coredns-6d4b75cb6d-8gw6c through plugin: invalid network status for"\nMay 26 01:53:13 master-zulu cri-dockerd[1098]: time="2022-05-26T01:53:13+01:00" level=info msg="Will attempt to re-write config file /var/lib/docker/containers/8ee7640d48c129058259b4b7632a0f6173ad8a9e2d5368cf3c9f29d1ea7db13e/resolv.conf as [nameserver 192.168.3.48 nameserver 192.168.0.1]"\nMay 26 01:55:30 master-zulu cri-dockerd[1098]: time="2022-05-26T01:55:30+01:00" level=info msg="Will attempt to re-write config file /var/lib/docker/containers/f378aff3d077030215ef664d72132b189f8412a8d432e5a554cdbfbb37c3ea19/resolv.conf as [nameserver 10.96.0.10 search django-space.svc.cluster.local svc.cluster.local cluster.local options ndots:5]"\nMay 26 01:55:30 master-zulu cri-dockerd[1098]: time="2022-05-26T01:55:30+01:00" level=info msg="Failed to read pod IP from plugin/docker: Couldn\'t find network status for django-space/django-588cb669d4-46b4w through plugin: invalid network status for"\nMay 26 01:55:31 master-zulu cri-dockerd[1098]: time="2022-05-26T01:55:31+01:00" level=info msg="Failed to read pod IP from plugin/docker: Couldn\'t find network status for django-space/django-588cb669d4-46b4w through plugin: invalid network status for"\nMay 26 01:55:43 master-zulu cri-dockerd[1098]: time="2022-05-26T01:55:43+01:00" level=info msg="Will attempt to re-write config file /var/lib/docker/containers/9523255b7991855027185cecbc8420bbe1268fcef21c2ddcb4d76851bce7e3a0/resolv.conf as [nameserver 10.96.0.10 search django-space.svc.cluster.local svc.cluster.local cluster.local options ndots:5]"\nMay 26 01:55:43 master-zulu cri-dockerd[1098]: time="2022-05-26T01:55:43+01:00" level=info msg="Failed to read pod IP from plugin/docker: Couldn\'t find network status for django-space/postgres-deployment-b58d5ff94-hs7t4 through plugin: invalid network status for"\nMay 26 01:55:43 master-zulu cri-dockerd[1098]: time="2022-05-26T01:55:43+01:00" level=info msg="Failed to read pod IP from plugin/docker: Couldn\'t find network status for django-space/postgres-deployment-b58d5ff94-hs7t4 through plugin: invalid network status for"\n
Run Code Online (Sandbox Code Playgroud)\n

有谁知道我应该做什么来解决这个问题?

\n

更新:

\n

cni0k8s master 上缺少接口:

\n
masterzulu@master-zulu:~$ ifconfig -a\ndocker0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500\n        inet 172.17.0.1  netmask 255.255.0.0  broadcast 172.17.255.255\n        inet6 fe80::42:e9ff:fec1:dd1b  prefixlen 64  scopeid 0x20<link>\n        ether 02:42:e9:c1:dd:1b  txqueuelen 0  (Ethernet)\n        RX packets 5140  bytes 418818 (418.8 KB)\n        RX errors 0  dropped 0  overruns 0  frame 0\n        TX packets 5475  bytes 522703 (522.7 KB)\n        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0\n\nenp3s0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500\n        inet 192.168.0.196  netmask 255.255.255.0  broadcast 192.168.0.255\n        inet6 fe80::e808:144d:a0dc:60a6  prefixlen 64  scopeid 0x20<link>\n        ether 98:40:bb:3e:f2:1c  txqueuelen 1000  (Ethernet)\n        RX packets 6332  bytes 515688 (515.6 KB)\n        RX errors 0  dropped 0  overruns 0  frame 0\n        TX packets 6684  bytes 631167 (631.1 KB)\n        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0\n\nflannel.1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1450\n        inet 10.244.0.0  netmask 255.255.255.255  broadcast 0.0.0.0\n        inet6 fe80::494:d8ff:fe1b:4aab  prefixlen 64  scopeid 0x20<link>\n        ether 06:94:d8:1b:4a:ab  txqueuelen 0  (Ethernet)\n        RX packets 0  bytes 0 (0.0 B)\n        RX errors 0  dropped 0  overruns 0  frame 0\n        TX packets 0  bytes 0 (0.0 B)\n        TX errors 0  dropped 129 overruns 0  carrier 0  collisions 0\n
Run Code Online (Sandbox Code Playgroud)\n

The*_*DHM 6

经过一番调查,我发现cri-dockerd服务缺少一些参数:

\n
CGroup: /system.slice/cri-docker.service\n         \xe2\x94\x94\xe2\x94\x801098 /usr/local/bin/cri-dockerd --container-runtime-endpoint fd:// --network-plugin=\n
Run Code Online (Sandbox Code Playgroud)\n

我手动将它们添加到/etc/systemd/system/cri-docker.service

\n
...\nExecStart=/usr/local/bin/cri-dockerd --network-plugin=cni --cni-bin-dir=/opt/cni/bin --cni-cache-dir=/var/lib/cni/cache --cni-conf-dir=/etc/cni/net.d --pod-infra-container-image=k8s.gcr.io/pause:3.7\n...\n
Run Code Online (Sandbox Code Playgroud)\n

重载服务:

\n
sudo systemctl daemon-reload\nsudo systemctl restart cri-docker.service\n
Run Code Online (Sandbox Code Playgroud)\n

此时 cri-dockerd 配置正确,但问题仍然存在,后来我注意到它/opt/cni/bin是空的(没有容器网络插件):

\n
masterzulu@master-zulu:~$ sudo /usr/local/bin/cri-dockerd\nINFO[0000] Connecting to docker on the Endpoint unix:///var/run/docker.sock\nINFO[0000] Start docker client with request timeout 0s\nINFO[0000] Hairpin mode is set to none\nERRO[0000] Error validating CNI config list ({\n  "name": "cbr0",\n  "cniVersion": "0.3.1",\n  "plugins": [\n    {\n      "type": "flannel",\n      "delegate": {\n        "hairpinMode": true,\n        "isDefaultGateway": true\n      }\n    },\n    {\n      "type": "portmap",\n      "capabilities": {\n        "portMappings": true\n      }\n    }\n  ]\n}\n): [failed to find plugin "portmap" in path [/opt/cni/bin]]\nINFO[0000] Docker cri networking managed by network plugin kubernetes.io/no-op\n...\nINFO[0000] Setting cgroupDriver cgroupfs\nINFO[0000] Docker cri received runtime config &RuntimeConfig{NetworkConfig:&NetworkConfig{PodCidr:,},}\nINFO[0000] Starting the GRPC backend for the Docker CRI interface.\nINFO[0000] Start cri-dockerd grpc backend\n\n
Run Code Online (Sandbox Code Playgroud)\n

我想我错误地删除了 /opt/cni/bin ,所以我再次添加了它的内容(获取最新版本):

\n
cd /tmp && mkdir cni-plugins && wget https://github.com/containernetworking/plugins/releases/download/v1.1.1/cni-plugins-linux-amd64-v1.1.1.tgz  && cd cni-plugins &&  tar zxfv ../cni-plugins-linux-amd64-v1.1.1.tgz\nsudo cp /tmp/cni-plugins/* /opt/cni/bin/\n\nls /opt/cni/bin\nbandwidth  bridge  dhcp  firewall  flannel  host-device  host-local  ipvlan  loopback  macvlan  portmap  ptp  sbr  static  tuning  vlan  vrf\n
Run Code Online (Sandbox Code Playgroud)\n

重新启动cri-docker服务后,一切开始按预期工作:

\n
masterzulu@master-zulu:~$ kubectl get pods -Ao wide\nNAMESPACE      NAME                                  READY   STATUS    RESTARTS   AGE   IP              NODE\ndjango-space   django-588cb669d4-4zz7f               1/1     Running   0          11s   10.244.0.4      master-zulu\ndjango-space   postgres-deployment-b58d5ff94-scmrx   1/1     Running   0          12s   10.244.0.5      master-zulu\nkube-system    coredns-6d4b75cb6d-rnjlm              1/1     Running   0          73m   10.244.0.2      master-zulu\nkube-system    coredns-6d4b75cb6d-s6zl7              1/1     Running   0          73m   10.244.0.3      master-zulu\n
Run Code Online (Sandbox Code Playgroud)\n

cni0已上线:

\n
masterzulu@master-zulu:~$ ifconfig -a\ncni0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1450\n        inet 10.244.0.1  netmask 255.255.255.0  broadcast 10.244.0.255\n        inet6 fe80::8c8:84ff:fe78:d999  prefixlen 64  scopeid 0x20<link>\n        ether 0a:c8:84:78:d9:99  txqueuelen 1000  (Ethernet)\n        RX packets 27714  bytes 5010722 (5.0 MB)\n        RX errors 0  dropped 0  overruns 0  frame 0\n        TX packets 26936  bytes 2898949 (2.8 MB)\n        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0\n
Run Code Online (Sandbox Code Playgroud)\n

cri-docker状态:

\n
masterzulu@master-zulu:~$ sudo systemctl status cri-docker\n\xe2\x97\x8f cri-docker.service - CRI Interface for Docker Application Container Engine\n     Loaded: loaded (/etc/systemd/system/cri-docker.service; enabled; vendor preset: enabled)\n     Active: active (running) since Fri 2022-05-27 22:39:06 BST; 1h 57min ago\nTriggeredBy: \xe2\x97\x8f cri-docker.socket\n       Docs: https://docs.mirantis.com\n   Main PID: 187399 (cri-dockerd)\n      Tasks: 11\n     Memory: 17.1M\n     CGroup: /system.slice/cri-docker.service\n             \xe2\x94\x94\xe2\x94\x80187399 /usr/local/bin/cri-dockerd --network-plugin=cni --cni-bin-dir=/opt/cni/bin --cni-cache-dir=/var/lib/cni/cache --cni-conf-dir=/etc/cni/net.d --po>\n\nMay 28 00:36:20 master-zulu cri-dockerd[187399]: time="2022-05-28T00:36:20+01:00" level=info msg="Using CNI configuration file /etc/cni/net.d/10-flannel.conflist"\n
Run Code Online (Sandbox Code Playgroud)\n

我的结论

\n

cri-dockerd启动参数的缺失或CNI--network-plugin=cni配置中的任何其他问题可能会导致此问题,其中cri-docker认为CNI丢失并直接使用接口,以便 Pod 从此范围获取 IP 。docker0172.17.0.x

\n

希望这对遇到同样问题的人有所帮助。

\n