aks报告“豆荚不足”

use*_*383 5 azure kubernetes

我已经阅读了此处介绍的Azure Cats&Dogs教程,在AKS中启动应用程序的最后一步中遇到了错误。Kubernetes报告说我的豆荚不够用,但是我不确定为什么会这样。几周前,我已经完成了同样的教程,没有任何问题。

$ kubectl apply -f azure-vote-all-in-one-redis.yaml
deployment.apps/azure-vote-back created
service/azure-vote-back created
deployment.apps/azure-vote-front created
service/azure-vote-front created

$ kubectl get pods
NAME                                READY   STATUS    RESTARTS   AGE
azure-vote-back-655476c7f7-mntrt    0/1     Pending   0          6s
azure-vote-front-7c7d7f6778-mvflj   0/1     Pending   0          6s

$ kubectl get events
LAST SEEN   TYPE      REASON                 KIND         MESSAGE
3m36s       Warning   FailedScheduling       Pod          0/1 nodes are available: 1 Insufficient pods.
84s         Warning   FailedScheduling       Pod          0/1 nodes are available: 1 Insufficient pods.
70s         Warning   FailedScheduling       Pod          skip schedule deleting pod: default/azure-vote-back-655476c7f7-l5j28
9s          Warning   FailedScheduling       Pod          0/1 nodes are available: 1 Insufficient pods.
53m         Normal    SuccessfulCreate       ReplicaSet   Created pod: azure-vote-back-655476c7f7-kjld6
99s         Normal    SuccessfulCreate       ReplicaSet   Created pod: azure-vote-back-655476c7f7-l5j28
24s         Normal    SuccessfulCreate       ReplicaSet   Created pod: azure-vote-back-655476c7f7-mntrt
53m         Normal    ScalingReplicaSet      Deployment   Scaled up replica set azure-vote-back-655476c7f7 to 1
99s         Normal    ScalingReplicaSet      Deployment   Scaled up replica set azure-vote-back-655476c7f7 to 1
24s         Normal    ScalingReplicaSet      Deployment   Scaled up replica set azure-vote-back-655476c7f7 to 1
9s          Warning   FailedScheduling       Pod          0/1 nodes are available: 1 Insufficient pods.
3m36s       Warning   FailedScheduling       Pod          0/1 nodes are available: 1 Insufficient pods.
53m         Normal    SuccessfulCreate       ReplicaSet   Created pod: azure-vote-front-7c7d7f6778-rmbqb
24s         Normal    SuccessfulCreate       ReplicaSet   Created pod: azure-vote-front-7c7d7f6778-mvflj
53m         Normal    ScalingReplicaSet      Deployment   Scaled up replica set azure-vote-front-7c7d7f6778 to 1
53m         Normal    EnsuringLoadBalancer   Service      Ensuring load balancer
52m         Normal    EnsuredLoadBalancer    Service      Ensured load balancer
46s         Normal    DeletingLoadBalancer   Service      Deleting load balancer
24s         Normal    ScalingReplicaSet      Deployment   Scaled up replica set azure-vote-front-7c7d7f6778 to 1

$ kubectl get nodes
NAME                       STATUS   ROLES   AGE    VERSION
aks-nodepool1-27217108-0   Ready    agent   7d4h   v1.9.9
Run Code Online (Sandbox Code Playgroud)

我唯一能想到的改变是我现在也正在运行其他(较大)集群,而我再次阅读此Cats&Dogs教程的主要原因是因为我今天在其他集群中遇到了同样的问题。我的Azure帐户是否存在资源限制问题?

更新10-20 / 3:15 PST:请注意,尽管这三个群集是在不同的资源组中创建的,但它们如何显示它们都使用相同的节点池。另请注意,gem2-cluster的“ get-credentials”调用如何报告错误。我确实有一个名为gem2-cluster的集群,该集群使用相同的名称删除并重新创建了(实际上,我删除了wole资源组)。正确的做法是什么?

$ az aks get-credentials --name gem1-cluster --resource-group gem1-rg
Merged "gem1-cluster" as current context in /home/psteele/.kube/config

$ kubectl get nodes -n gem1
NAME                       STATUS   ROLES   AGE     VERSION
aks-nodepool1-27217108-0   Ready    agent   3h26m   v1.9.11

$ az aks get-credentials --name gem2-cluster --resource-group gem2-rg
A different object named gem2-cluster already exists in clusters

$ az aks get-credentials --name gem3-cluster --resource-group gem3-rg
Merged "gem3-cluster" as current context in /home/psteele/.kube/config

$ kubectl get nodes -n gem1
NAME                       STATUS   ROLES   AGE   VERSION
aks-nodepool1-14202150-0   Ready    agent   26m   v1.9.11

$ kubectl get nodes -n gem2
NAME                       STATUS   ROLES   AGE   VERSION
aks-nodepool1-14202150-0   Ready    agent   26m   v1.9.11

$ kubectl get nodes -n gem3
NAME                       STATUS   ROLES   AGE   VERSION
aks-nodepool1-14202150-0   Ready    agent   26m   v1.9.11
Run Code Online (Sandbox Code Playgroud)

Lip*_*sum 13

您的max-pods设置为什么?当您达到每个节点的窗格数限制时,这是一个正常错误。

您可以使用以下命令检查当前每个节点的最大Pod数:

$ kubectl get nodes -o yaml | grep pods
  pods: "30"
  pods: "30"
Run Code Online (Sandbox Code Playgroud)

而您目前拥有:

$ kubectl get pods --all-namespaces | grep Running | wc -l
  18
Run Code Online (Sandbox Code Playgroud)

  • 不用`grep Running`命令就值得检查,因为这个命令让我失望了。由于cron作业映像拉出故障,我大约有700个吊舱处于挂起状态。谢谢。 (2认同)

Act*_*ack 7

我击中这个是因为我超过了最大 pods,我发现我可以通过以下方式处理多少:

$ kubectl get nodes -o json | jq -r .items[].status.allocatable.pods | paste -sd+ - | bc
Run Code Online (Sandbox Code Playgroud)

  • kubectl get Nodes -o json 并定位 items[].status.allocatable.pods 有效。它是 4 个,并且 4 个 pod 已经在系统命名空间中运行。 (2认同)

Ken*_*SFT 0

检查以确保您没有达到订阅的核心限制。

az vm list-usage --location "<location>" -o table
Run Code Online (Sandbox Code Playgroud)

如果您可以请求更多配额,https://learn.microsoft.com/en-us/azure/azure-supportability/resource-manager-core-quotas-request