AWS EKS 节点创建失败

ell*_*lli 10 amazon-web-services amazon-eks

我在 AWS 中有一个按照这些说明创建的集群。

然后我尝试根据文档在该集群中添加节点。

似乎无法使用健康问题类型创建vpc-cni节点corednsinsufficientNumberOfReplicas The add-on is unhealthy because it doesn't have the desired number of replicas.

Pod 的状态kubectl get pods -n kube-system

NAME                       READY   STATUS             RESTARTS   AGE
aws-node-9cwkd             0/1     CrashLoopBackOff   13         42m
aws-node-h4qjt             0/1     CrashLoopBackOff   13         42m
aws-node-jrn5x             0/1     CrashLoopBackOff   13         43m
coredns-745979c988-25fcc   0/1     Pending            0          120m
coredns-745979c988-qvh7h   0/1     Pending            0          120m
kube-proxy-2bmlq           1/1     Running            0          42m
kube-proxy-hjcrw           1/1     Running            0          43m
kube-proxy-j9r9n           1/1     Running            0          42m
Run Code Online (Sandbox Code Playgroud)

Pod的日志aws-node-9cwkd

{"level":"info","ts":"2021-11-30T14:11:14.156Z","caller":"entrypoint.sh","msg":"Validating env variables ..."}
{"level":"info","ts":"2021-11-30T14:11:14.157Z","caller":"entrypoint.sh","msg":"Install CNI binaries.."}
{"level":"info","ts":"2021-11-30T14:11:14.177Z","caller":"entrypoint.sh","msg":"Starting IPAM daemon in the background ... "}
{"level":"info","ts":"2021-11-30T14:11:14.179Z","caller":"entrypoint.sh","msg":"Checking for IPAM connectivity ... "}
{"level":"info","ts":"2021-11-30T14:11:16.189Z","caller":"entrypoint.sh","msg":"Retrying waiting for IPAM-D"}
{"level":"info","ts":"2021-11-30T14:11:18.198Z","caller":"entrypoint.sh","msg":"Retrying waiting for IPAM-D"}
{"level":"info","ts":"2021-11-30T14:11:20.205Z","caller":"entrypoint.sh","msg":"Retrying waiting for IPAM-D"}
{"level":"info","ts":"2021-11-30T14:11:22.215Z","caller":"entrypoint.sh","msg":"Retrying waiting for IPAM-D"}
{"level":"info","ts":"2021-11-30T14:11:24.226Z","caller":"entrypoint.sh","msg":"Retrying waiting for IPAM-D"}
Run Code Online (Sandbox Code Playgroud)

运行命令kubectl describe pod aws-node-h4qjt -n kube-system会出现以下错误:

Readiness probe failed: {"level":"info","ts":"2021-11-30T14:11:07.145Z","caller":"/usr/local/go/src/runtime/proc.go:225","msg":"timeout: failed to connect service \":50051\" within 5s"}
Run Code Online (Sandbox Code Playgroud)

为了成功在集群中创建节点,任何帮助将不胜感激。

Dan*_*son 16

这很可能是节点服务角色的问题。如果您执行到 pod 中,然后查看 ipamd.log,您可以获得更多信息

kubectl exec -it aws-node-9cwkd -n kube-system -- /bin/bash 
cat /host/var/log/aws-routed-eni/ipamd.log
Run Code Online (Sandbox Code Playgroud)

这是我遇到相同错误时的错误示例

{"level":"error","ts":"2021-12-02T13:27:51.464Z","caller":"ipamd/ipamd.go:444","msg":"无法调用 ec2:描述 [eni-0c01bd25ae6999ed5] 的网络接口:UnauthorizedOperation:您无权执行此操作。\n\t状态代码:403,请求 id:0438b84b-8052-4f31-9d63-c2ff7512f131"}

就我而言,我必须将 AmazonEKS_CNI_Policy 策略添加到节点 IAM 角色。

https://docs.aws.amazon.com/eks/latest/userguide/cni-iam-role.html