Kubernetes - 主节点中的 kube-system Pod 在工作节点加入后不断重启

Saa*_*ooq 4 kubernetes weave flannel kubeadm

我按照这个教程和本教程这一个,但我面临着同样的问题,最近3天。

我可以通过以下步骤正确设置主节点:

kubeadm init

mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

export kubever=$(kubectl version | base64 | tr -d ‘\’)
kubectl apply -f "https://cloud.weave.works/k8s/net?k8s-version=$kubever"
Run Code Online (Sandbox Code Playgroud)

一切似乎都很好

kubectl get all --namespace=kube-system
Run Code Online (Sandbox Code Playgroud)

然后,

在工作节点上:

kubeadm join --token 864655.fdf6d0b389867b79 192.168.100.17:6443 --discovery-token-ca-cert-hash sha256:a2d840808b17b53b9612e6271ccde489f13dbede7d354f97188d0faa9e210af2
Run Code Online (Sandbox Code Playgroud)

输出似乎很好,如下所示:

[preflight] Running pre-flight checks.
  [WARNING FileExisting-crictl]: crictl not found in system path
[preflight] Starting the kubelet service
[discovery] Trying to connect to API Server "192.168.100.17:6443"
[discovery] Created cluster-info discovery client, requesting info from "https://192.168.100.17:6443"
[discovery] Requesting info from "https://192.168.100.17:6443" again to validate TLS against the pinned public key
[discovery] Cluster info signature and contents are valid and TLS certificate validates against pinned roots, will use API Server "192.168.100.17:6443"
[discovery] Successfully established connection with API Server "192.168.100.17:6443"

This node has joined the cluster:
* Certificate signing request was sent to master and a response
  was received.
* The Kubelet was informed of the new secure connection details.

Run 'kubectl get nodes' on the master to see this node join the cluster.
Run Code Online (Sandbox Code Playgroud)

但是一旦我运行这个命令,一切都会崩溃。这

kubectl get all --namespace=kube-system
Run Code Online (Sandbox Code Playgroud)

开始显示所有 Pod 一直在重新启动。状态在 Pending 和 Running 之间不断变化,有时一些 pod 甚至会消失,并可能具有 ContainerCreating 状态等。

NAME                                READY     STATUS    RESTARTS   AGE
po/etcd-ubuntu                      0/1       Pending   0          0s
po/kube-controller-manager-ubuntu   0/1       Pending   0          0s
po/kube-dns-6f4fd4bdf-cmcfk         3/3       Running   0          13m
po/kube-proxy-2chb6                 1/1       Running   0          13m
po/kube-scheduler-ubuntu            0/1       Pending   0          0s
po/weave-net-ptdxr                  2/2       Running   0          11m
Run Code Online (Sandbox Code Playgroud)

我也试过第二个教程,用法兰绒,并得到完全相同的问题。

我的设置

我创建了两个新虚拟机,在 VMware 上全新安装了 Ubuntu 17.10,每个虚拟机具有 2 个处理器/2 核 6 GB 内存和 50 GB 硬盘。我的物理机是 i7-6700k,内存为 32GB。我在它们上面都安装了 kubeadm、kubelet 和 docker,然后按照上面提到的步骤操作。

我也尝试过在 VMware 上的 NAT 和 Bridge 之间切换,但没有任何改变。

两个具有桥接网络的虚拟机的初始 IP 是 192.168.100.12 和 192.168.100.17。该hostname -I硕士:

192.168.100.17 172.17.0.1 10.32.0.1 10.32.0.2
Run Code Online (Sandbox Code Playgroud)

hostname -I对工人的节点:

192.168.100.12 172.17.0.1 10.44.0.0 10.32.0.1
Run Code Online (Sandbox Code Playgroud)

journalctl -xeu kubelet 显示以下内容:

https://gist.github.com/saad749/9a771a3460bf88c274498b5bc4b7fd84

在尝试使用法兰绒(仍然是同样的问题)时,结果来自

kubectl describe nodes
Run Code Online (Sandbox Code Playgroud)

https://gist.github.com/saad749/d24c453c8b4e663e9abf572a0fb38bf4

我在 kubeadm init 之前遗漏了任何步骤吗?我应该更改 IP 地址(更改为什么)?有没有我应该查看的特定日志?有没有更全面的教程?所有问题在 kubeadm 加入工作节点后开始,我可以在主节点或任何其他东西上部署 kubernetes,它工作正常。

更新:

即使应用了 errordeveloper 的建议,同样的问题仍然存在。

我将以下标志添加到 kubeadm init:

--apiserver-advertise-address 192.168.100.17
Run Code Online (Sandbox Code Playgroud)

我将 kubeadm.conf 更新为以下内容并重新加载并重新启动:https ://gist.github.com/saad749/c7149c87ec3e75a40586f626cf04279a

并尝试更改集群 dns https://gist.github.com/saad749/5fa66bebc22841e58119333e75600e40

这是初始化主服务器后的日志:

kube-master@ubuntu:~$ kubectl get pod --all-namespaces -o wide
NAMESPACE     NAME                             READY     STATUS    RESTARTS   AGE       IP               NODE
kube-system   etcd-ubuntu                      1/1       Running   0          22s       192.168.100.17   ubuntu
kube-system   kube-apiserver-ubuntu            1/1       Running   0          29s       192.168.100.17   ubuntu
kube-system   kube-controller-manager-ubuntu   1/1       Running   0          13s       192.168.100.17   ubuntu
kube-system   kube-dns-6f4fd4bdf-wfqhb         3/3       Running   0          1m        10.32.0.7        ubuntu
kube-system   kube-proxy-h4hz9                 1/1       Running   0          1m        192.168.100.17   ubuntu
kube-system   kube-scheduler-ubuntu            1/1       Running   0          34s       192.168.100.17   ubuntu
kube-system   weave-net-fkgnh                  2/2       Running   0          32s       192.168.100.17   ubuntu
Run Code Online (Sandbox Code Playgroud)

主机名 -i 结果:

kube-master@ubuntu:~$ hostname -I
192.168.100.17 172.17.0.1 10.32.0.1 10.32.0.2 10.32.0.3 10.32.0.4 10.32.0.5 10.32.0.6 10.244.0.0 10.244.0.1
kube-master@ubuntu:~$ hostname -i
192.168.100.17
Run Code Online (Sandbox Code Playgroud)

结果来自:

kubectl describe nodes
Run Code Online (Sandbox Code Playgroud)

https://gist.github.com/saad749/8f460650182a04d0ddf3158a52761a9a

内部 IP 现在似乎是正确的。

从第二个节点加入后,会发生这种情况:

kube-master@ubuntu:~$ kubectl get nodes
NAME      STATUS    ROLES     AGE       VERSION
ubuntu    Ready     master    49m       v1.9.3
kube-master@ubuntu:~$ kubectl get pod --all-namespaces -o wide
NAMESPACE     NAME                             READY     STATUS              RESTARTS   AGE       IP               NODE
kube-system   kube-controller-manager-ubuntu   0/1       Pending             0          0s        <none>           ubuntu
kube-system   kube-dns-6f4fd4bdf-wfqhb         0/3       ContainerCreating   0          49m       <none>           ubuntu
kube-system   kube-proxy-h4hz9                 1/1       Running             0          49m       192.168.100.17   ubuntu
kube-system   kube-scheduler-ubuntu            1/1       Running             0          1s        192.168.100.17   ubuntu
kube-system   weave-net-fkgnh                  2/2       Running             0          48m       192.168.100.17   ubuntu
Run Code Online (Sandbox Code Playgroud)

ifconfig -a 结果:

https://gist.github.com/saad749/63a5a52bd3246ff72477b2aca7d158d0

journalctl -xeu kubelet 结果

https://gist.github.com/saad749/8a60870b35f93df8565e66cb208aff32

有时,pods IP 显示为 192.168.100.12,这是非主第二节点的 IP。

kube-master@ubuntu:~$ kubectl get pod --all-namespaces -o wide
NAMESPACE     NAME                             READY     STATUS    RESTARTS   AGE       IP               NODE
kube-system   etcd-ubuntu                      0/1       Pending   0          0s        <none>           ubuntu
kube-system   kube-apiserver-ubuntu            0/1       Pending   0          0s        <none>           ubuntu
kube-system   kube-controller-manager-ubuntu   1/1       Running   0          0s        192.168.100.12   ubuntu
kube-system   kube-dns-6f4fd4bdf-wfqhb         2/3       Running   0          3h        10.32.0.7        ubuntu
kube-system   kube-proxy-h4hz9                 1/1       Running   0          3h        192.168.100.12   ubuntu
kube-system   kube-scheduler-ubuntu            0/1       Pending   0          0s        <none>           ubuntu
kube-system   weave-net-fkgnh                  2/2       Running   1          3h        192.168.100.17   ubuntu

kube-master@ubuntu:~$ kubectl get pod --all-namespaces -o wide
NAMESPACE     NAME                       READY     STATUS    RESTARTS   AGE       IP               NODE
kube-system   kube-dns-6f4fd4bdf-wfqhb   3/3       Running   0          3h        10.32.0.7        ubuntu
kube-system   kube-proxy-h4hz9           1/1       Running   0          3h        192.168.100.12   ubuntu
kube-system   weave-net-fkgnh            2/2       Running   0          3h        192.168.100.12   ubuntu


kubectl describe nodes
Name:               ubuntu
Roles:              master
Labels:             beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/os=linux
                    kubernetes.io/hostname=ubuntu
                    node-role.kubernetes.io/master=
Annotations:        node.alpha.kubernetes.io/ttl=0
                    volumes.kubernetes.io/controller-managed-attach-detach=true
Taints:             node-role.kubernetes.io/master:NoSchedule
CreationTimestamp:  Fri, 02 Mar 2018 08:21:47 -0800
Conditions:
  Type             Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----             ------  -----------------                 ------------------                ------                       -------
  OutOfDisk        False   Fri, 02 Mar 2018 11:38:36 -0800   Fri, 02 Mar 2018 08:21:43 -0800   KubeletHasSufficientDisk     kubelet has sufficient disk space available
  MemoryPressure   False   Fri, 02 Mar 2018 11:38:36 -0800   Fri, 02 Mar 2018 08:21:43 -0800   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure     False   Fri, 02 Mar 2018 11:38:36 -0800   Fri, 02 Mar 2018 08:21:43 -0800   KubeletHasNoDiskPressure     kubelet has no disk pressure
  Ready            True    Fri, 02 Mar 2018 11:38:36 -0800   Fri, 02 Mar 2018 11:28:25 -0800   KubeletReady                 kubelet is posting ready status. AppArmor enabled
Addresses:
  InternalIP:  192.168.100.12
  Hostname:    ubuntu
Capacity:
 cpu:     4
 memory:  6080832Ki
 pods:    110
Allocatable:
 cpu:     4
 memory:  5978432Ki
 pods:    110
System Info:
 Machine ID:                 59bf65b835b242a3aa182f4b8a542219
 System UUID:                0C3C4D56-4747-D59E-EE09-F16F2793677E
 Boot ID:                    658b4a08-d724-425e-9246-2b41995ecc46
 Kernel Version:             4.13.0-36-generic
 OS Image:                   Ubuntu 17.10
 Operating System:           linux
 Architecture:               amd64
 Container Runtime Version:  docker://1.13.1
 Kubelet Version:            v1.9.3
 Kube-Proxy Version:         v1.9.3
ExternalID:                  ubuntu
Non-terminated Pods:         (3 in total)
  Namespace                  Name                        CPU Requests  CPU Limits  Memory Requests  Memory Limits
  ---------                  ----                        ------------  ----------  ---------------  -------------
  kube-system                kube-dns-6f4fd4bdf-wfqhb    260m (6%)     0 (0%)      110Mi (1%)       170Mi (2%)
  kube-system                kube-proxy-h4hz9            0 (0%)        0 (0%)      0 (0%)           0 (0%)
  kube-system                weave-net-fkgnh             20m (0%)      0 (0%)      0 (0%)           0 (0%)
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  CPU Requests  CPU Limits  Memory Requests  Memory Limits
  ------------  ----------  ---------------  -------------
  280m (7%)     0 (0%)      110Mi (1%)       170Mi (2%)
Events:
  Type     Reason                   Age                 From             Message
  ----     ------                   ----                ----             -------
  Warning  Rebooted                 12m (x814 over 2h)  kubelet, ubuntu  Node ubuntu has been rebooted, boot id: 16efd500-a2a5-446f-ba25-1187857996e0
  Normal   NodeHasNoDiskPressure    10m                 kubelet, ubuntu  Node ubuntu status is now: NodeHasNoDiskPressure
  Normal   Starting                 10m                 kubelet, ubuntu  Starting kubelet.
  Normal   NodeAllocatableEnforced  10m                 kubelet, ubuntu  Updated Node Allocatable limit across pods
  Normal   NodeHasSufficientDisk    10m                 kubelet, ubuntu  Node ubuntu status is now: NodeHasSufficientDisk
  Normal   NodeHasSufficientMemory  10m                 kubelet, ubuntu  Node ubuntu status is now: NodeHasSufficientMemory
  Normal   NodeNotReady             10m                 kubelet, ubuntu  Node ubuntu status is now: NodeNotReady
  Warning  Rebooted                 2m (x870 over 2h)   kubelet, ubuntu  Node ubuntu has been rebooted, boot id: 658b4a08-d724-425e-9246-2b41995ecc46
  Warning  Rebooted                 15s (x60 over 10m)  kubelet, ubuntu  Node ubuntu has been rebooted, boot id: 16efd500-a2a5-446f-ba25-1187857996e0
Run Code Online (Sandbox Code Playgroud)

我究竟做错了什么?

Saa*_*ooq 5

因此,在遵循@errordeveloper 的建议并仍然碰壁之后,我能够解决这个非常简单的问题。

我的两个虚拟机都具有相同的主机名。

hostname -f 
Run Code Online (Sandbox Code Playgroud)

会回来

ubuntu
Run Code Online (Sandbox Code Playgroud)

在两者上,这显然会导致 kubernetes 出现问题。

我更改了非主节点上的名称

hostnamectl set-hostname kminion
Run Code Online (Sandbox Code Playgroud)

并在以下文件中:

/etc/hostname
/etc/hosts
Run Code Online (Sandbox Code Playgroud)

一切都很顺利!