Kubeadm为什么我的节点没有出现虽然kubelet说它加入了?

Pau*_*tte 5 amazon-web-services kubernetes terraform

我正在使用自动缩放组和Terraform设置Kubernetes部署.kube主节点位于ELB后面,以便在出现问题时获得一些可靠性.ELB的运行状况检查设置为tcp 6443,并且tcp侦听器为8080,6443和9898.所有实例和负载均衡器都属于一个安全组,该组允许组成员之间的所有流量以及来自NAT网关的公共流量地址.我使用以下脚本创建了我的AMI(从入门指南)...

# curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add -
# cat <<EOF > /etc/apt/sources.list.d/kubernetes.list
deb http://apt.kubernetes.io/ kubernetes-xenial main
EOF
# apt-get update
# # Install docker if you don't have it already.
# apt-get install -y docker.io
# apt-get install -y kubelet kubeadm kubectl kubernetes-cni
Run Code Online (Sandbox Code Playgroud)

我使用以下用户数据脚本......

kube master

#!/bin/bash
rm -rf /etc/kubernetes/*
rm -rf /var/lib/kubelet/*

kubeadm init \
  --external-etcd-endpoints=http://${etcd_elb}:2379 \
  --token=${token} \
  --use-kubernetes-version=${k8s_version} \
  --api-external-dns-names=kmaster.${master_elb_dns} \
  --cloud-provider=aws
until kubectl cluster-info
do
  sleep 1
done
kubectl apply -f https://git.io/weave-kube
Run Code Online (Sandbox Code Playgroud)

kube节点

#!/bin/bash
rm -rf /etc/kubernetes/*
rm -rf /var/lib/kubelet/*

until kubeadm join --token=${token} kmaster.${master_elb_dns}
do
  sleep 1
done
Run Code Online (Sandbox Code Playgroud)

一切似乎都正常.主设备启动并响应kubectl命令,其中包含用于发现,dns,编织,控制器管理器,api-server和调度程序的pod.kubeadm在节点上有以下输出...

Running pre-flight checks
<util/tokens> validating provided token
<node/discovery> created cluster info discovery client, requesting info from "http://kmaster.jenkins.learnvest.net:9898/cluster-info/v1/?token-id=eb31c0"
node/discovery> failed to request cluster info, will try again: [Get http://kmaster.jenkins.learnvest.net:9898/cluster-info/v1/?token-id=eb31c0: EOF]
<node/discovery> cluster info object received, verifying signature using given token
<node/discovery> cluster info signature and contents are valid, will use API endpoints [https://10.253.129.106:6443]
<node/bootstrap> trying to connect to endpoint https://10.253.129.106:6443
<node/bootstrap> detected server version v1.4.4
<node/bootstrap> successfully established connection with endpoint https://10.253.129.106:6443
<node/csr> created API client to obtain unique certificate for this node, generating keys and certificate signing request
<node/csr> received signed certificate from the API server:
Issuer: CN=kubernetes | Subject: CN=system:node:ip-10-253-130-44 | CA: false
Not before: 2016-10-27 18:46:00 +0000 UTC Not After: 2017-10-27 18:46:00 +0000 UTC
<node/csr> generating kubelet configuration
<util/kubeconfig> created "/etc/kubernetes/kubelet.conf"

Node join complete:
* Certificate signing request sent to master and response
  received.
* Kubelet informed of new secure connection details.

Run 'kubectl get nodes' on the master to see this machine join.
Run Code Online (Sandbox Code Playgroud)

不幸的是,kubectl get nodes在master上运行只会将自身作为节点返回.我在/ var/log/syslog中看到的唯一有趣的事情是

Oct 27 21:19:28 ip-10-252-39-25 kubelet[19972]: E1027 21:19:28.198736   19972 eviction_manager.go:162] eviction manager: unexpected err: failed GetNode: node 'ip-10-253-130-44' not found
Oct 27 21:19:31 ip-10-252-39-25 kubelet[19972]: E1027 21:19:31.778521   19972 kubelet_node_status.go:301] Error updating node status, will retry: error getting node "ip-10-253-130-44": nodes "ip-10-253-130-44" not found
Run Code Online (Sandbox Code Playgroud)

我真的不确定在哪里看......

hit*_*o_o 6

两台机器(主机和节点)的主机名应该不同.您可以通过运行来检查它们cat /etc/hostname.如果它们恰好相同,则编辑该文件以使它们不同,然后执行a sudo reboot应用更改.否则kubeadm将无法区分这两台机器,它将在kubectl get节点中显示为单个机器.