Google Autopilot 集群：无法调度的 Pod

Question

Google Autopilot 集群：无法调度的 Pod

Sah*_*hka 7 google-cloud-platform google-kubernetes-engine

我在 autopilot 集群中创建了一个具有资源/限制的 pod：

    Limits:
      cpu:                500m
      ephemeral-storage:  1Gi
      memory:             512Mi
    Requests:
      cpu:                500m
      ephemeral-storage:  1Gi
      memory:             512Mi

Run Code Online (Sandbox Code Playgroud)

但根据我读到的内容，一切都应该自动配置。我不知道如何向集群添加新节点。

  Warning  FailedScheduling   2m39s (x3979 over 4d3h)  gke.io/optimize-utilization-scheduler  0/3 nodes are available: 1 Insufficient memory, 3 Insufficient cpu.
  Normal   NotTriggerScaleUp  85s (x68738 over 4d5h)   cluster-autoscaler                     pod didn't trigger scale-up (it wouldn't fit if a new node is added): 2 node(s) didn't match node selector

Run Code Online (Sandbox Code Playgroud)

Google 控制台显示可能的操作：

Increase maximum size limit for autoscaling in one or more node pools that have autoscaling enabled.

Run Code Online (Sandbox Code Playgroud)

但这是自动驾驶仪，根据文档，它应该自动完成，而我根本无法做到这一点。很奇怪。

Answer 1

Wil*_*iss 3

如果没有看到 Podspec YAML，就很难精确调试。然而，收到NotTriggerScaleUp消息意味着 Autopilot永远不会为这些 pod 添加节点，并且它们将陷入待处理状态。这可能是因为自动缩放器添加新节点无法满足某些条件。

无法满足条件的一个示例是节点选择器，其中 Pod 请求放置在不存在的区域中。由于自动缩放器无法在不存在的节点中创建节点，因此该 Pod 将永远处于 Pending 状态。

当自动缩放程序可以为您的 Pod 提供资源时，您将看到一条TriggeredScaleUp消息（通常在 Pod 进入 Pending 状态后约 10 秒内，但在某些情况下可能需要一分钟）。

我写了一篇更一般的解释，说明挂起的 Pod 如何在 Autopilot 中工作，以及您可以查找什么。

归档时间：	4 年，7 月前
查看次数：	3309 次
最近记录：	4 年，4 月前